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ABSTRACT 

This document provides an administrator with the knowledge to set up and 
maintain DART automated testing installations for use in ranges of physical 
and ESXi virtual machines leveraged for cyber testing and verification. 
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1 DART Automated Test Execution Technology Overview 

1.1 What is Tyrant? 

Tyrant is the name given by Lockheed Martin Advanced Technology Labs to a suite of technology it 
developed for running automated software tests. It is one of the major components of the complete 
DART (Dynamic Automated Range Testing) tool suite. (The other major component handles building out 
cyber test ranges.) The Tyrant suite allows multiple simultaneous developers and testers to run tests in a 
reproducible manner across a wide array of resources. The Tyrant suite also allows administrators to 
manage these test resources. Using Tyrant technologies, one can encode the logic of a test to perform 
against one or more test resources as well as the logic to determine whether the test is a success or 
failure. One can then run this test against a specific set of resources (i.e. the resources needed to run a 
single instance of the test), or have multiple instances of the test run against a diverse range of resources 
to evaluate the functionality of a system-under-test against the whole range of systems it may be run on 
in the real world. Finally, multiple developers can perform these tests simultaneously against a shared 
set of resources without conflicting with each other. 

1.2 Technical Components Overview 

Tyrant is made up of the following central technical components: 

• Palantir (Testing Interface) 

• HAL (Hardware Abstraction Layer) 

• Undermine (Test Script Harness) + Test Scripts 

• Overmind (Test Script Scheduler) + Test Plans 

• Overview (Test Resource/Result GUI) 

• Reaper (Test Resource Sanitizer) 

• Remote Commit (Remote Job Submission) 

Each component is standalone from the others, allowing for each user to determine the combination of 
components most appropriate for their testing scenarios. 

The contents of this manual apply to DART Tyrant code released on May 6,2014. 

1.2.1 Palantir 

Palantir provides a common, cross-platform test interface to each of the computer resources in the test 
environment that are running commodity operating systems (e.g. Windows, Linux, OSX, FreeBSD, etc.). 
This allows test script writers to write more cross-platform test scripts, expecting a consistent interface 
(e.g. “put file”, “execute”, “spawn”, etc.) on each of the computers needed for the user’s test scenario. 
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Figure 1 Palantir provides control of each test resource 


1.2.2 Hardware Abstraction Layer (HAL) 

The HAL provides a framework for implementations of computer-level operations, i.e. operations done 
to the computer from outside of it, such as powering it on, powering it off or saving its state. These are 
all operations which can be performed without interacting with the operating system on the computer. 
Based on attributes of the computer being operated on, the HAL chooses the correct implementation of 
the logical operation desired (e.g. for a “power_off” operation, the HAL might use an ESXi control library 
for ESXi VMs, whereas for a physical computer, the HAL would use a configured PDU to power off the 
outlet the computer is connected to). For controlling saved states, various methods can be implemented 
as plugins to the HAL framework, such as reverting a VM snapshot for VMs, applying a disk image to 
physical machines, or using an auto-building system to automatically install an OS on a computer from 
scratch. 

1.2.3 Undermine + Test Scripts 

Undermine is a test script harness for executing the user’s test scenarios. The user will encode their test 
procedure in a “test script”. Then he/she will run that test script via Undermine, which will effectively 
automate their procedure. Undermine performs the actions on the test resources via Palantir if it exists 
on the test resource. 



Figure 2 Undermine executes test operations against resources 
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1.2.4 Overmind + Test Plans 

Overmind examines the set of test resources in the Test Resource section of the Overmind Database and 
schedules tests to be run across those resources. The user describes what tests they want to run, the 
resource constraints, and the desired iteration and/or replication of each test script in a “test plan”. 
Multiple users can submit test plans concurrently, and Overmind will schedule each possible test to run 
when possible. Overmind runs an instance of Undermine for each test that needs to be performed. 
When tests are completed Overmind puts the test results into the Test Result section of the Overmind 
Database and marks the resource as requiring sanitization in the Test Resource section of the Overmind 
Database. 



Figure 3 Overmind schedules tests across a range of resources 


1.2.5 Overview 

Overview is the web-based GUI that allows a user to view currently running and past test results in the 
Test Result section of the Overmind Database previously inserted by Overmind as well as graphically 
manage the resources in the Test Resource section of the Overmind Database. 
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1.2.6 Reaper 

Reaper monitors the Test Resources section of the Overmind Database for resources that require 
sanitization and uses the HAL to restore a previously-saved, sanitized state to the computer. Different 
sanitization methods can be specified for each resource. After the sanitization process completes, 
Reaper marks the resource as clean in the Test Resource section of the Overmind Database, indicating to 
Overmind that the resource can be considered for use in a test again. 



Figure 5 Reaper sanitizes resources for testing 
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1.2.7 Remote Commit (Remote Job Submission) 

While there are many ways to install and configure the Tyrant software, one very common setup is the 
one that allows for multiple users to submit and run their own tests on a set of common testing 
resources. 

In this scenario each user has their own local copy of Overmind, Undermine, his/her tool or System 
Under Test (SUT), and test scripts and test plans relative to that SUT. Each user runs the 
remote_commit tool to copy those five components from their local box to the main testing server, 
start Overmind, and submit the necessary test plans. This ensures that each user can be running 
completely different tests from every other user. This also allows each user to continue their 
development of any of those five components without affecting currently running tests. 

Each user can then browse to the Overview web GUI to monitor progress of their running tests. 

The following picture illustrates all of the Tyrant components working together to execute this CONOR 
(“P” == Palantir, “UM” == Undermine, “OM” == Overmind) 
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Figure 6 Remote commit allows parallel development and testing 
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1.2.8 Plunger (Database Cleanup) 

There are multiple ways to condense, delete and purge records from the Tyrant database. One approach 
is to run bin/db_admin commands such as del_reservation_history, 
condense_contentions, and del_contentions. Another is to run the extendable service, 
Plunger. 

The administrator can determine what database condense and cleanup tasks to allow Plunger to run by 
setting options in the configuration file, plunger.rc. He or she can also add plunger modules in the 
plunger_extensions section of the configuration file to perform additional database maintenance 
functions. 

1.3 For the Administrator 

For the range administrator, Tyrant provides a set of tools to manage a large range of resources (typically, 
but not necessarily, virtual machines) for use with automated tests. Tyrant provides the tools to make 
this range available to developers and testers and coordinate access to its resources to prevent conflicts 
between tests. This manual covers how to set up ESXi virtual machine and physical resources to be 
usable for tests and how to set up the Tyrant technologies to make these resources into a range available 
for automated tests. 

While detailed documentation is provided, nearly every tool accepts a command line "-h” option that 
will have the tool print out its usage. 

1.4 Recommended Hardware 

This section describes some recommendations for hardware to use based on desired range size. DART 
would definitely work on less capable hardware than what is listed below; however, the system’s 
throughput would suffer in various ways or features would be lost. 
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Computer Reservation History 



Computer Reservation History Metrics 



Tyrant Metrics shows recommended hardware sets for some common virtual machine range sizes, with 
Dell PowerEdge model numbers as examples of systems meeting our recommendations. 

2.3.1 Test Server 

The test server, on which the logic of test scripts is run, needs disk capacity for test output files, and CPU 
and RAM capacity since each running test instance is a separate process. Disk speed is of little concern 
because the bottleneck on this server is in processing. As the number of tests you plan on ever running 
increases, more hard disk capacity is needed (as would be the case for a program which runs for a long 
time or has heavy active development). As the number of tests you run concurrently increases (e.g. 
more developers working and testing at once), greater CPU and RAM capacity is needed. 

2.3.2 ESXi Infrastructure 

The ESXi servers, which actually run the VMs, need disk capacity, disk performance, and CPU 
performance and RAM capacity. The combination of these factors influence how many VMs each server 
can successfully run. As the raw number of VMs or number of VM snapshots you wish to have on the 
server increases, the needed hard drive capacity increases. As the number of VMs you wish to actually 
have running at once increases (which will usually be at least loosely correlated with the raw number of 

13 


UNCLASSIFIED 



UNCLASSIFIED 


VMs), the need for disk performance and CPU and RAM capacity increases. Higher disk performance is 
also correlated with quicker snapshot reverting capability (which speeds up testing since snapshot 
reverts are done at the beginning of each test). 

The vCenter server needs RAM and hard disk capacity proportional to the desired range size. For larger 
ranges, an enterprise-class database server (e.g. Oracle llg) is recommended. 

2.3.3 Physical Machine Test Resources 

For a range including physical machines, you will need to acquire the physical machines themselves and 
one or more network-accessible PDUs to enable tyrant to automatically power machines on and off. 

In selecting the physical machines themselves, consider the following constraints: 

• The machines should have an Ethernet network interface which supports the PXE standard. The 
vast majority of machines support this; you will find it difficult to find one which doesn’t. 

• The machines should be capable of being configured to automatically start up whenever the 
outlet to which they are connected is powered on. This is configured via the BIOS menu on the 
machine. 

Tyrant currently only has support for APC PDUs which are controllable via SNMP. (Specifically, we have 
used an APC AP7930 PDU in our development.) 

2.4 Repository Structure 

The core of Tyrant is provided in two Mercurial repositories: tybase and tyworkflow. The tybase 
repository provides the tools used for running single tests. The tyworkflow repository provides the tools 
used for running tests across ranges of resources and managing these ranges. The tybase repository is 
standalone, but the tyworkflow repository has dependencies on tybase. 

With the split of functionality between tybase and tyworkflow, it can sometimes be difficult to 
determine which repository you should be in to perform certain operations. Unless specifically directed 
otherwise by this manual, the following are a few general rules to help you determine the correct place 
to perform certain operations: 

• If the operation relates to running an individual test against specific test resources without use 
of any range scheduling (e.g. overmind), you should be in tybase. 

• If the operation relates to linking in collections of test scripts (leafbags), you should be in tybase. 

• If the operation relates to scheduling tests to run on a range or managing a range of shared 
resources, you should be in tyworkflow. 

The upsync_apps repository contains code used for testing with PSPs (Personal Security Products) and 
for automatic software updating. It also contains support for detecting PSP events by watching a 
resource’s screen for changes. 
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Related to tyutils are the PIL repositories (PIL-linux-i686 and PIL-linux-x86_64). These provide the Python 
Imaging Library, which is used by tyutils event detection to find changes between screenshots. Two exist 
because they contain compiled code which needs to match the architecture of the system it will run on. 

In-depth descriptions of the contents of each repository are included in Appendix B - Detailed Repository 
Layouts. 

2.5 Tools 

This manual covers the following Tyrant tools, which will be used and configured by the range 
administrator (with the repository containing each tool in parentheses): 

palantir (tybase): A component which runs on each test resource, allowing the resource to be 
controlled by test scripts via a TCP connection. Palantir serves on ports 51134 and 51135 (for 
Windows) and 51134 (for Linux). 

comp_admin (tybase): A component which allows you to perform computer-level operations against 
a computer (e.g. power_on, power_of f , save_state, etc.). 

undermine (tybase): The component which runs a single instance of a test. Undermine connects via 
palantir to the resource(s) used by a test and runs the given test script. 

create_resource (tyworkflow): A component which uses the HAL to save the current state of the 
given computer and then automatically creates a resource entry for that combination of 
computer and state in the database. 

overmind (tyworkflow): The component which schedules tests to run simultaneously. Overmind 
uses a database of test resources and schedules these resources for incoming tests. 

reaper (tyworkflow): The component which handles reverting a resource to a clean state prior to 
running a test. What this means depends upon the reaper module in use for a given resource. 
For a virtual machine, this usually means reverting to a snapshot, but for a physical computer, 
this could mean using some imaging or auto-building solution to build out a certain OS 
configuration on a resource on-demand for a test. 

overview (tyworkflow): The component which displays test results and allows some management of 
test resources via a web-based interface. 

process_plan (tyworkflow): A command-line tool used to process test plans (specifications for 
running multiple instances of tests against a specified variety of resources). This lets you submit 
plans to be run or see how many combos (specific runs of a test on specific resources and with 
specific parameters) would be run. 

remote_commit (tyworkflow): A tool which allows remote submission of tests to a central Tyrant 
environment. While overmind on its own allows simultaneous tests to be run against a range of 
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resources, remote commit makes overmind useful for a team of developers running different 
sets of code. 

autoplan (tyworkflow): A tool which allows the automatic generation and processing of simple test 
plans based upon command line arguments. This allows users to run tests across a diverse range 
of resources without having to write a test plan. 


2.6 Directory Structure 

Because each setup will involve testing different components, users will need to create a directory 
structure similar to the following: 



Figure 7 Typical Tyrant testing environment directory structure 

Symlinks can be made between the various repositories using the standard In - s command. 

Additionally, a user should be in the base of either tyworkflow or tybase to run specific commands. For 

example, if a user was running . /bin/ undermine, he would be doing so out of the tybase directory. 

If he was running . /bin/remote_commit, he would be doing so out of the tyworkflow directory. 

2.7 Assumptions 

This document makes some assumptions about the type of setup desired. Specifically: 

• It is assumed that testing will be performed on VMWare ESXi virtual machines or physical 
machines connected to a network-addressable APC PDU. 

• It is assumed that remote commit-style testing will be performed (with a team of developers, 
each at their own workstation, remotely submitting tests to a central server). 

• It is assumed a mysql database will be used for storing resource and test information rather than 
sqlite3. 
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It is assumed you will configure the reaper to connect directly to individual ESXi servers when 
reverting VMs rather than going through a vCenter server. 

The remote commit environment on the test server will run as root, and developers will submit 
their tests as root on the test server. 

It is assumed that each VM resource will be set up to revert to the current snapshot, whatever 
that is (rather than having multiple named snapshots per VM). This doesn't mean that a VM 
can't have more than snapshot, only that overmind will be set up to simply revert to the current 
one. 

The administrator is assumed to have a working knowledge of Unix-like operating systems, 
especially the platform chosen as the testing server. 
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3 Range Setup 

3.1 Pre-Tyrant 

Prior to involving Tyrant tools, the environment for the range must be set up. For this you will need as 
many servers as necessary to host the number of virtual machines you wish to have in your range and as 
many PDUs as necessary to provide outlets to support the number of physical machines you have. You'll 
also need a Linux server which may be either a separate physical server or a virtual machine in the ESXi 
range. This will be known henceforth as the “test server” and is where overmind, the reaper, and 
overview will be run. A separate physical server is recommended as the test server will be subjected to 
heavy load if many users are running tests concurrently and will need to store a lot of data (depending 
upon how much data the tests generate). 

The test server should meet the following requirements: 

• runs Fedora 10 or later (these are explicitly supported, though other Linux distributions have 
been known to work without modifications) 

• has Apache HTTPD installed (in Fedora, the httpd package) and set to start on boot 

o For newer versions of Fedora, run systemctl enable httpd . service. For 
older versions, run chkconfig --levels 2345 httpd on. 

o Start it using service httpd start. 

• “apache” server system user must have sudo permission for all commands without a password, 
and requiretty must be disabled 

0 Edit /etc/sudoers 

o Add the line “apache ALL=(ALL) NOPASSWD: ALL” 

0 Comment out the line “Defaults requiretty” 

• has MySQL server installed (in Fedora, the mysql-server package) and set to start on boot 

o Use the same commands as the previous item, but replace httpd with mysqld. 

0 When MySQL is started for the first time, set the password. The command to do this is 
likely mysqladmin -u root password "<newpassword>". 

O Configure a user in MySQL which has permissions to create and delete databases. 
Typically, we use the root account, but you can create a different account for this 
purpose. In either case, you must be able to log into this account from remote 
machines. In the case of using the existing root account, you may need to enable 
remote logins. The following command will do this (replace “password” with the root 
password for your system): 

■ GRANT ALL PRIVILEGES ON *.* TO 'root'§'%' IDENTIFIED BY 
'<newpassword>' WITH GRANT OPTION; 

• has PHP installed, with the MYSQL, GD,JSON, and POSIX extensions (in Fedora, the php-mysql, 
php-gd, php-json, and php-posix packages, which may be included inside of php-common) 

0 Note: The JSON extension is built-in as of PHP 5.2.0. If your PHP predates that version, 
you’ll need to install JSON separately. For 5.2.0+, do not install any separate JSON 
package. 


UNCLASSIFIED 


18 



UNCLASSIFIED 


• has mercurial installed, including the hgweb component (in Fedora, the mercurial package) 

• has zip installed (in Fedora, the zip and unzip packages) 

• has python, python-devel and MySQL-python packages installed 

• has a DHCP server installed and configured to respond to DHCP requests on whichever interface 
is connected to the same network as the test resources (in Fedora, the dhcp package) 

• has an accurate clock which is kept in sync with developer workstations 

Configure a mercurial server. This can be done in one of two ways: 

• For small environments, you can use the built-in mercurial web server. To do this: 

0 Designate a directory on the test server that will contain the repositories being served, 
(e.g. /proj/repos) 

O In this directory, create a text file called hg r C. In this file, place the following contents 
replacing ALL instances of PATH with the path to the directory contains repositories to 
serve. Yes, this does mean the above line has the same path twice. 

[collections] 

<PATH> = <PATH> 

[web] 

push_ssl=False 

allow_push=* 

o Run the command nohup hg serve --webdir-conf <PATH>/hgrc &where 
PATH is again the path to the directory of repositories to serve. 

0 Your repositories will be available at e.g. http://testserver.example.com:8000. 

0 To make the mercurial server start at boot, add the above command to 

/ etc/ rc . local in the proper place for user-added commands (generally, just at the 
end of that file, though comments therein may direct you otherwise). When you put the 
command in /etc/rc . local, ensure that the path to the hgrc file is an absolute 
path. 

• For larger environments, you can run a mercurial server through Apache httpd. Follow the 
instructions here: http://mercurial.selenic.eom/wiki/PublishingRepositories#hgweb. 

Configure the traffic to get through iptables firewall (which is on by default in most Fedora builds). This 
can be done by either wholly disabling the firewall or by enabling specific ports that are not on by 
default. 


• Disabling iptables can be done by running the following command as root: service 
iptables stop. 

• Changing the iptables rules can be done in a variety of ways - depending on the administrator’s 
preference. One method for doing so would be modifying / etc/ sysconf ig/ iptables (or 
equivalent config file) to open up ports 80 (http), 443 (https), 3306 (mysql), and 8000 (hgweb). 
After modifying the file, remember to restart iptables using the following command: service 
iptables restart. 
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Either way, add the tybase and tyworkflow repositories to the set of repositories served here. The code 
is provided as a repo with a single initial commit, so you can just copy these directories to the repository 
directory on the test server. When copying from a CD, you may have to fix the permissions of the files 
since files copied from a CD are sometimes set without write permission. To do this, for each repository 
you copy, run the command 

Chmod -R u+w <DESTDIR> 

on the copy on your system, where <DESTDIR> is the path to that copy. 

Configure the MySQL server to start at boot. Create a user which has permissions to create and drop 
databases and can log in from remote machines. 

Configure Apache HTTPD be accessible from developer workstations. Also, ensure that developer 
workstations can connect to the test server. 

3.2 Remote Commit Environment Setup 

Next, we'll set up the environment from which developers submit their tests. The remote job submission 
scripts are written in bash and have dependencies on modern Fedora Linux distributions to be able to 
function without modification. What follows is a description of the process that LM ATL went through to 
set up their most recent Fedora 18 (F18) environments. 

1. Acquire the i686 or the x64 Live version of F18. 

(We downloaded ours from http://mirror.anl.gov/pub/fedora/linux/releases/18/Live/.) 

2. Boot the i686 or x64 media and install using default settings. 

3. Once up, perform an install (or update) of the following packages: 

• mercurial (hg) 

• make 

• python 

• mysql 

• MySQL-python 

• zip 

(NOTE: Most of the packages installed are not strictly required; however, we have found that 
having them installed in our remote job submission environments has been beneficial over the 
past years, so we now standardize on having them installed.) 

(NOTE: If using Fedora the yum server is not immediately reachable, the user would need to 
modify the /etc/yum . conf file to point it at the appropriate yum server.) 

We understand that some in the future might want to do remote job submission natively from a 
Windows host. This has been something we have historically worked around by mounting directories 
with Samba (e.g. mounting the build directory of a tool on Linux such that when the tool is built, it is 
immediately accessible from the remote job submission scripts). However, we anticipate needing to 
(and want to) write a native Windows process down the road. 
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It is somewhat likely that the tools being tested themselves may put other requirements on the remote 
job submission environment (especially if a user wants to integrate tool-specific build and customization 
logic as part of test submission). Those requirements are not accounted for here. 

IMPORTANT: If there are special packages and libraries that need to be installed in order to run tests, 
these packages must also be installed on the Tyrant server as well as the remote commit machine. 

3.3 Remote Commit Directory Setup 

On the test server, as root, create a directory which will be the root of the remote commit environment. 
Our convention is to use /pro j / testing, and this document will use that path throughout. This is 
henceforth referred to as the “remote commit root”. 

• In the remote commit root, create a directory named commits (henceforth referred to as the 
“commits directory” e.g. /pro j /testing/commits). 

• Inside the commits directory, create subdirectories named for each developer's and each 
administrator's username. As new developers come on board, you will need to create new 
directories for them. These directories are where all the files necessary to run a test are synced 
to when a developer submits a test with remote commit. By default, when a developer submits 
a test, the files are synced to the directory with the developer's username. By creating multiple 
directories (e.g. jdoe, jdoe2, jdoe3, etc), developers can submit multiple tests to run 
simultaneously by specifying the commits directory subdirectory name when they run remote 
commit. 

0 NOTE: If you desire for some users to submit tests with remote_commit as a non-root 
user, then chown their commits directories to be owned by the user and primary group 
they will use. For example, if developer John Doe has a user account named “jdoe” 
whose primary group is “users”, then after creating his commits directories, you would 
run e.g.: 

chown jdoe:users /proj/testing/jdoe 

/pro j /testing/ jdoe2 /pro j /testing/ jdoe3 
You will then need to inform John Doe that he should set “jdoe” for the value of the 
remote_user option in the remote_commit section of rc/remote_commit . rc 
(this is covered in the user manual). 

• Back in the remote commit root directory (e.g. /pro j /testing), clone both tybase and 
tyworkflow. cd into tyworkflow and run make. 

• For the administrators and each developer or tester who will submit tests to this remote commit 
environment, add their public SSH identification key to the . ssh/authorized_keys file on 
the test server for the user account that user will use to log in to the test server.. This will allow 
the developer to log in without a password as root. Without this, the developer would have to 
type in their password several times when they submit tests. For users who log in as root, this 
would be /root/ . ssh/authorized_keys. For John Doe, to continue the example from 
above, this would be -jdoe/ . ssh/authorized_keys. 
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3.4 Over mind Database Setup 

Next, the Overmind Database must be set up. We set up the database before setting up the test 
resources because some of the tools we’ll use to set up physical machine test resources require data 
about the physical machines to already be in the database. The purpose of this step is to populate the 
database with all the data necessary to automatically run test plans. To that end, we will be populating 
the database with information about the following entities: 

• recipes: Specifications of the OS and other software which run on a test resource. 

• VM hosts: The bare metal machines on which virtualization infrastructure like ESXi runs and on 
which VM computers reside. 

• PDUs: Network-addressable devices which allow switching power to physical computers on or 
off, to which physical computers are connected. 

• VLANs: Definition of the networks to which the resources may be connected, including a 
specification of MAC and IP address pools to select from when cloning machines in a VLAN. 

• computers: The “shells” on which recipes run (e.g. the physical or virtual machine on which one 
would install an OS and other software) 

• resources: Resources are the combination of a computer, a snapshot name, and a recipe. 
Effectively, a resource entry declares that “snapshot X on computer Y contains recipe Z”. 
Resources are the entities which overmind schedules for testing. 

First, we configure and initialize the database. In your tyworkflow directory in the remote commit root, 
copy rc/ def aults/db . rc to rc/. In the copied file, make the following changes: 

• set resource_manager/dbname to the name of the database you want Overmind to use 
(db_admin will create it for you in a later step) 

• set resource_manager/engine to mysql instead of sqlite3 

• set mysql/host to the hostname or IP address of the server the database resides on 

• set mysql/user and mysql/passwd to the username and password used to access the 
database you set in resource_manager/dbname 

• For the above settings, make sure you uncomment any options which you set which are 
currently commented. That is, remove the semicolon from the beginning of any option line that 
you modify. 

Initialize the database by running the following command from the root of tyworkflow. If your database 
already exists and you wish to wipe it out, add the +d r op option to the command line: 
bin/db_admin init_db 

3.4.1 Importing Recipes 

Next, populate the database. First, create a CSV file of the recipes with one recipe per line in the 
following format: 

<name>, <family>, <os>, <ossp>, <lang>, <arch>, <apps> 

None of these fields have specifically defined values (they are just strings in a database that are matched 
in various places), but the values used must be consistent between the overmind database and test 


22 


UNCLASSIFIED 


UNCLASSIFIED 


plans used with the range, because test plans will match on these fields. The values suggested below are 
the most commonly used values, and their use is strongly recommended. 

• name: Whatever you want (that can be put in a SQL text field). A convention is to do family - 
os-lang-arch, with all the values shortened (e.g. lin-f Cl4-en-x86 for a 32-bit Fedora 
14 English resource). Throw in the service pack before the language if it's a Windows resource 
(e.g. win-xp-sp3-en-x86). 

• family: win, lin 

• OS: 

0 for Windows, abbreviated OS name plus edition (e.g. xppro, 7ult, 2k8entr2) 

0 for Linux, abbreviated OS name and version (e.g. fcl2, fcl9) 

• OSSp: 

0 for Windows: spO, spl, sp2, sp3, spla (typical Windows service pack numbers) 

0 for Linux: ws 

• lang: use the two-letter language codes that Microsoft uses 

• arch:x86, x64 

• apps: This held is a comma-separated list of app names, which typically indicate either that a 
certain piece of software (an "app") has been installed in this recipe, or that a certain feature has 
been enabled. Leave blank if not using any apps. 

Import your recipes file with the command: 

bin/db_admin import_recipes <RECIPE_FILE> 
where <RECIPE_FILE> is the path to the recipe file you created. 

An example recipe . CSV file with a recommended schema is available for customization in the docs/ 
directory. 

At this point, you should be able to use db_admin to view the recipes you’ve imported. Run 
bin/db_admin list_recipes 

The output should contain records for each recipe you had in recipes . CSV, with all the correct field 
values. 

3.4.2 Importing VM Hosts and PDUs 

In order to support computer-level operations (e.g. power control and saved state control), the database 
needs to know metadata about the VM hosts on which VMs reside and the PDUs to which physical 
machines are connected. To this end, you must populate the database with information about VM hosts 
and PDUs (done in this section). Then when importing computers, inform the database which VM host 
or PDU the computer resides on (done in section 3.4.3). 

First, we’ll import VM hosts. Create a VM host csv file with one line per VM host, in the following 
format: 

<host>, <type>, <username>, <password>, <max_vms_in_use> 

The fields have the following meanings: 
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• host: The hostname or IP address of the VM host. 

• type: The type of VM host. This field is used to pick which HAL implementation to use to 
perform computer-level operations on resources on this VM host. Use esxi here, as all your 
VMs are ESXi VMs. 

• username: The username used to log in to the VM host. 

• password: The password used to log in to the VM host. 

Import your VM hosts file with the command: 

bin/db_admin import_vm_hosts <VM_HOSTS_FILE> 
where <VM_HOSTS_FILE> is the path to the VM hosts file you created. 

Verify this by running this command: 

bin/db_admin list_vm_hosts 

You should see the VM hosts you just imported. 

Next, create a PDU csv file with one PDU per line, in the following format: 

<host>, <model>, <username>, <password> 

The fields have the following meanings: 

• host: The hostname or IP address of the PDU. 

• model: The model name of the PDU. This is used to pick which PDU control implementation 
the HAL will use to perform power operations on computers connected to this PDU. Use ape 
here, since all your PDUs should be APC PDUs. 

• username: The username used to log in to the PDU. Leave this field empty since our APC PDU 
control implementation does not require authentication (because it uses SNMP). 

• password: The password used to log in to the PDU. Again, leave this field blank. 

Due to the blank username and password fields, you’ll end up with csv lines that look like the following: 

somehost, ape, , 

This is expected. 

Import your PDU file with the command: 

bin/db_admin import_pdus <PDU_FILE> 
where <PDU_FILE> is the path to the PDU file you created. 

3.4.3 Importing VLANs 

Next, create a vlan CSV file. This file will inform tyrant about the networks which exist in your range. In 
the case of VMs, the vlan names MUST match the name of the virtual network to which your VM’s 
network interface is attached (i.e. the network name you select in the Properties dialog, in the “Network 
label” dropdown for an interface on ESXi VMs). Put lines in the vlan CSV file in the following format: 
<name>,<ip_min>,<ip_max>,<mac_min>,<mac_max> 

The fields have the following meanings: 

• name: The name of the vlan. 
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• ip_min and ip_max: Lower and upper bounds on a pool of IP addresses from which new unique 
IP addresses will be chosen when needed (e.g. when cloning a computer in this vlan). It is alright 
if some IP addresses in this pool are already in use in the tyrant database, as the code which 
assigns new IP addresses will check for this. These fields MUST be IPv4 addresses specified in 
dotted-decimal notation. 

• mac_min and mac_max: Lower and upper bounds on a pool of MAC addresses from which new 
unique MAC addresses will be chosen when needed (e.g. when cloning a computer in this vlan). 

It is alright if some MAC addresses in this pool are already in use in the tyrant database, as the 
code which assigns new unique MAC addresses will check for this. These fields must be specified 
in the traditional hexadecimal format, with octets separated by colons hyphens are 
not acceptable. 

0 NOTE: When setting MAC ranges for VMWare, the valid MAC address range is 

00:50:56:00:00:00-00:50:56:3F:FF:FF. You will get errors if you try to use addresses 
outside of this range and operations such as cloning will fail. Also, due to how the clone 
process autogenerates clone names (if no clone name is specified), you should limit your 
range to 65,536 addresses (e.g. 00:50:56:00:00:00 to 00:50:56:00:FF:FF). 

• netmask: The subnet mask in use on the vlan. This must be in IPv4 dotted-decimal notation. For 
example, if you're used to thinking about your vlan as "192.168.0.0/16", then the value for this 
field would be "255.255.0.0". 

• gateway: The default gateway for the vlan, in IPv4 dotted-decimal notation. This field is optional; 
leave empty in the CSV if you don't want to set it. 

• dns: IP addresses of DNS server, in IPv4 dotted-decimal notation. Multiple DNS server addresses 
must be separated with semicolons. This field is optional; leave empty in the CSV if you don't 
want to set it. 

3.4.4 Importing Test Resources 

Next, create a resource CSV file which contains the computer and snapshot definitions. The file takes 
lines in pairs of the following format, where the first line defines a computer and the second line defines 
a snapshot on that computer. 

<name>, <ip>, <mac>, <hwtype>, <model>, <pool>, <vlan>, <state_type>, <reaper>, <ext_attrs> 
©snapshot, <recipe>, <snapshot> 

For your VM test resources define one computer line and one snapshot line for each VM. For your 
physical machine test resources, define ONLY the computer line (the resource entry will be created later 
on). 

The fields have the following meanings: 

Computer fields: 

• name: The name of the computer. This must match the name of the VM in the virtualization 
environment. 

• ip: primary IP address of the computer (which the test server can use to connect to the VM) 
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mac: MAC address of the interface having the IP address given in the ip field. This should be 
formatted with colons (e.g. 01 : 02 : 03 : f 1 : f 2 : f 3). 

hwtype: a string designating a broad hardware type. For physical machines, this must be 
“phys”, and for virtual machines of any kind, this must be “vm”. The system will function with 
degraded functionality if other values are used here, but many HAL operations will not work. For 
USB thumb drives, use “thumb”. 

model: a string designating the model of computer. There is currently no pre-defined value for 
this; the intention is to give some indication of the hardware present in the computer. To that 
end, for physical machines, we recommend using a concise representation of the computer’s 
actual model (e.g. for a Dell 1850, you might use "delll850”) and for ESXi VMs, you could just 
use “esxi” for simplicity. 

pool: pool name the computer is part of. The pool is just a grouping construct for VMs. We use 
it to group VMs for certain types of testing. Then, in the plan files for our tests, we restrict 
certain tests to certain pools. 

vlan: name of the vlan the computer is connected to. This is used as a designation of the 
network the VM is on and must match the name of a vlan imported in the previous section. In 
our test environment, we have several networks VMs can be connected to, some of which are 
isolated from others. Some of our tests require VMs on the same network, or on two different 
networks, so we use the vlan field in test plans in order to select VMs from the desired networks. 
Using a feature of the overmind test plan language, users can write test plans which require 
resources that reside on the same (or different) networks. 

state_type: name of the type of saved states used by this computer. The value here is used 
by the HAL to pick the saved state control implementation which will be used to revert the 
machine to a clean state (when the reaper sanitizes the machine at the beginning of a test), 
among other places. Use “esxi” for your VMs, and “clonezilla” for your physical machines. For 
USB thumb drives, use “nop”. 

reaper: name of the reaper to use. This needs to match one of the reapers defined in the 
merging of rc/reaper . rc and rc/def aults/reaper . rc (which is covered in 
section 3 . 8Reaper Setup). For the parallel reaper setup this document uses for ESXi 
VMs, this field is the DNS name of the ESXi host on which the resource resides, matching the 
names of one of the sections you will put in your reaper . rc in the following section (e.g. for 
an ESXi host named "foo”, you will define a reaper . rc section called “reaper_foo”, and a VM 
which resides on this ESXi server would have a reaper value of “foo”). For physical machines, set 
this to “clonezilla”. 

ext_attrs: extra optional fields which provide hardware-type-specific extended attributes 
about the computer. These are specified as comma-separated key/value pairs in the format 
<KEY>=<VALUE>. For ESXi VMs, define the vm_host attribute to be the hostname or IP 
address of the ESXi host the VM resides on (which must be identical to one of the host fields 
you set when you imported VM hosts in section 3.4.2). For physical machines connected to a 
network-addressable PDU, define the pdu_host attribute to be the hostname or IP address of 
the PDU the machine is connected to (must match a PDU you imported in section 3.4.2) and 
define the pdu_OUtlet attribute to be the outlet number to which the machine is connected. 
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For an ESXi VM residing on an ESXi host named abdon, you might have the following line in your 
computers CSV file (this would be a single line in the CSV hie; it’s only wrapped here due to space 
constraints): 

compQl, 192 . 168 . 5 . 45, 00 : 50 : 5c : 3e : 00 : 04, vm, esxi, poolOl, vlanOl, esxi, 
abdon, vm_host=abdon 

For a physical Dell Precision T3500 using clonezilla saved states connected to outlet 6 of the PDU at 
IP address 192.168.4.24, you might have the following computer CSV line (again, a single line in the 
CSV but wrapped here due to spacing): 

compOl, 192 . 168 . 5 . 46, 5c : 26 : 0a : 41 : e5 : 61, phys, dellpt3500, poolOl, vlan01, 
clonezilla, cz, pdu_host=l92 . 168 . 4 . 24, pdu_outlet=6 

Snapshot fields: 

• recipe: the name or ID of the recipe this computer has installed on it; must match an existing 
recipe (as you would have imported just a bit ago) 

• snapshot: the unique identifier of the snapshot for this computer. The meaning of this field is 
dependent upon the state control implementation used for the computer this snapshot is bound 
to (as selected by the state_type held earlier). For the “esxi” state type, this held is the 
name of an ESXi snapshot, or the special value “latest” which means to simply use the VM’s 
current snapshot. For physical machines with the “clonezilla” state type, you should not be 
putting snapshot lines in the CSV hie (this will effectively be done later). 

Administrator note: If using named snapshots in overmind, the name of the specihed snapshot 
must be unique across all snapshots for the VM. 

Administrator note: Because “latest” in the snapshot held for ESXi VMs has special meaning, do 
not name any snapshots “latest” in ESXi if using named snapshots in overmind as they cannot be 
reverted to by name. 

Import your resource hie with the command: 

bin/db_admin import_computers <C0MPUTER_FILE> 
where <C0MPUTER_FILE> is the path to the hie you created. 

An example computer . CSV hie is available for customization in the docs/ directory. 

At this point, you should be able to see your imported data in the output of a few db_admin 
commands. First, run this command: 

bin/db_admin list_computers 

This command shows information about all the computers in the overmind database. This deals roughly 
with the physical (or virtual) “shell” on which a recipe resides. Thus, list_COmputers shows 
information like the IP address, MAC address, etc, but not recipe information like OS family or service 
pack level. You should see all of the VMs you listed in computer . csv (the hrst line of each pair of 
lines) in the output of this command. Next, run this command: 
bin/db_admin list_snapshots 

This command lists all the snapshot names known by overmind. In your case, there should only be one 
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record, named “latest” (later, after set up the physical machines, this command will show more than just 
“latest”). Finally, run this command: 

bin/db_admin list_resources 

This command shows you the resources available for testing. A resource is a combination of a computer 
(the shell on which a recipe resides), a snapshot name, and a recipe (this is the reason why there is only 
one snapshot record from the previous command). You should see all of the snapshots you listed in 
computer . CSV (the second in each pair of lines) in the output of this command. Currently, this will 
only show you the VMs you added because we have not finished setting up the physical machines. 

Set the computers you added in the database to be usable for testing. By default, when you insert 
computers into the database, they have their testing_use flag set to 'N' in case you don't want them 
to immediately get used for tests. Run the following command to set all computers in the database 
which are currently set not to be used for testing to be used for testing: 

bin/db_admin set_computer_flag testing_use 1 Y 1 testing_use= ' N ' 

This command is saying "set the testing_use computer flag to 'Y' for all computers which currently 
have the testing_use flag set to 'N'". If you then run 
bin/db_admin list_computers 

again, you should see all the same computer records, but the testing_use field should now say “N” 
for all of them. 

After setting up a resource, you may also test power options on the resource. To do this, go to tybase 
root and run the following power cycle command for the computer: 

bin/comp_admin name=test_comp power_cycle 

Running the power cycle command should cut off power from the PDU to the machine. (You should see 
the light for the outlet on the PDU turn off.) Then, after the delay time, power from the PDU to the 
machine should be restored. 

3.4.5 Creating Database Users 

So far, you have been using the root database user (or a database user with root privileges). Tyrant 
allows you to also create database users with lower levels of privilege (or create more users with root 
privileges). Tyrant has three database privilege levels: 

• view: Users at this level are only able to read data from the database. They cannot make any 
changes or add anything to the database. 

• test: Users at this level are able to run tests and create and destroy clones. 

• admin: Users at this level have full privileges. They can change anything in the database or even 
drop and re-create it. 

If you know now that you have certain people whom you’ll want to have different levels of access, create 
database user accounts for them. For each user, choose a username and password and decide what 
level of access they should have. Then, in tyworkflow, run the following command: 

bin/db_admin add_dbuser <USERNAME> <LEVEL> <PASSWORD> 
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Inform each user of their username and password. They will put this username and password in the 
mysql section of rc/db . rc in their workstation copies of tyworkflow, just as you did in this test server 
copy of tyworkflow at the beginning of section 3.4. Optionally, you may omit the <PASSWORD> 
parameter from the above command and you will be prompted to enter and confirm a password. 

3.5 Default Overmind Post-Op Threads Configuration 

Users can specify post-test operations as part of their test plans, which will be run in the context 
of the Overmind server (i.e., not as part of the Undermine process; in this way they differ from 
the “presetup” and "postcleanup” operations that can be encoded in test plans to run as part of 
Undermine). For example, a user can write a test plan such that if a test returns a certain result 
code, then clones will be created from all the resources used in that test. These post-test 
operations can have arbitrary runtime. As such, they cannot simply be run as part of the 
Overmind scheduling loop, when Overmind sees that a test has finished and cleans up after it 
(freeing resources, writing information to the database, etc), since doing so would cause the 
server to hang to indeterminate periods of time. 

To handle this, if a test instance has post-test operations, those operations are all run, serially, in 
a thread for that test instance. Overmind can be configured for the maximum number of post- 
test threads to run at one time. If this limit is reached, then further tests which have post-test 
operations are put on a queue to be run later when threads become available. The maximum 
number of concurrent post-test operation threads can be configured via the 
max_popthreads setting in the overmind section of rc/overmind . rc. The default 
setting for this value is listed in rc/def aults/overmind . rc. Consider the processing 
power of your test server and the capacity of your range and, if desired, change this limit by 
editing rc/overmind . rc (do not edit the file in rc/def aults). For example, if you wish 
to set the max postop threads setting to 10, you would put the following in 
rc/overmind . rc: 

[overmind] 

max_popthreads = 10 

Some notes are in order: 

• Because this is a per-overmind setting, users can override this when they commit tests 
to the server, either by modifying overmind . rc, or using the set_popthreads 
subcommand to remote_commit. 

• Other limits also apply which will effectively limit the load on your range. For example, 
ESXi hosts can have a limit on the number of concurrent VMs in use. Because the 
resources are still marked as in-use at the time post-op threads are run, if you have 
many tests running which all want to do post-test cloning, the number of VMs actually 
being cloned at any given time as part of post-test operations will still respect the ESXi 
host limit. 
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3.6 Clonezilla Setup 

To support saving and restoring disk images of physical machines with clonezilla, you must set up the test 
server to enable computers in the range to netboot from it. This section assumes your test server is 
already set up as a standard DHCP server, so in this section we simply add the necessary hies and 
configuration to support netbooting from it. 

1. Install the tftp-server and syslinux packages (command for Fedora servers): 

yum install tftp-server syslinux 

2. Copy the PXE "boot stub” hie to the tftp server root directory: 

cp /usr/share/syslinux/pxelinux . 0 /var/lib/tf tpboot 

3. Set up the PXE conhg directory and populate it with a default config hie. This default config hie 
causes any machine for which there does not exist a specific MAC-address-named conhg hie to 
simply boot its local hard drive (effectively the same as if it had not netbooted at all, but with a 
slight delay for the netboot process to occur). 

a. Create the directory: 

mkdir /var/lib/tf tpboot/pxelinux . cf g 

b. Write the following text to /var/lib/tf tpboot/pxelinux . cf g/def ault: 

DEFAULT local 

LABEL local 

LOCALBOOT 0 

4. Enable the tftp server to start on boot. How this is done depends upon what version of Fedora 
your test server is running, as follows: 

a. If your version of Fedora predates systemd (in which case the “systemctl” command will 
not exist), then we’ll use chkconf ig to set xinetd to start at boot time, which will 
automatically start tftp on-demand: 

i. Ensure the disable setting in /etc/xinetd . d/tf tp is set to “no”. 

ii. Run the command chkconf ig --levels 2345 xinetd on 

iii. Run service xinetd start to start xinetd (if not already running). 

b. If your version of Fedora has systemd but the hie 
/usr/lib/systemd/system/tf tp . socket does not exist, we will set systemd 
to start xinetd at boot: 

i. Ensure the disable setting in /etc/xinetd . d/tf tp is set to “no”. 

ii. Run the command systemctl enable xinetd . service 

iii. Run systemctl start xinetd (if not already running). 

c. If your version of Fedora has systemd and the tftp.socket hie exists, then we’ll set 
systemd to start tftp directly on-demand: 

i . Run the commands: 

systemctl enable tftp.socket 
systemctl start tftp.socket 

5. Modify the dhcp server conhg at /etc/dhcp/dhcpd . conf to conhgure netbooting support. 

a. Outside of any subnet section (e.g. at the top of the hie) add the following lines, which 
enable netbooting support generally: 
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allow booting; 
allow bootp; 

b. In the subnet section which serves DHCP for your physical machines, add the following 
setting which tells dhcpd what file to serve to machines requesting to netboot: 

filename "pxelinux.0"; 

c. Restart the dhcpd service to apply the changes. For older, pre-systemd versions of 
Fedora, do 

service dhcpd restart 

For systemd-based versions, do 

systemctl restart dhcpd 

6. Obtain the clonezilla live stable release i486 zip tile (the stable Debian version, not the 
“alternative stable” Ubuntu version). This can be downloaded from 
http://clonezilla.org/downloads.php. 

7. Create a directory which will contain the clonezilla netboot root (which will be served via NFS 
and used as the root filesystem for computers which are netbooting to do clonezilla operations) 
and unpack the downloaded clonezilla zip to it. We’ll use /proj/cz_netboot_root for our 
example. Then, copy a couple tiles from the clonezilla root to the tftp root. Run: 

mkdir /proj/cz_netboot_root 

unzip -/clonezilla-* .zip -d /proj/cz_netboot_root 
cp /proj/cz_netboot_root/live/vmlinuz /var/lib/tf tpboot 
cp /proj/cz_netboot_root/live/initrd . img /var/lib/tftpboot 

8. Create a directory which will be used as the root for clonezilla disk images, and create a couple 
directories inside of it. We’ll use /proj/czJmg_root in our example. Run this command: 

mkdir -p /proj/cz_img_root/{images, markers} 
chmod -R 755 /proj/cz_img_root 

9. Share the clonezilla image and netboot directories via NFS. Add the following lines to 

/etc/exports: 

/proj/cz_netboot_root *(rw, sync, no_root_squash) 

/proj/cz_img_root *(rw, sync, no_root_squash) 

Then, run this command: 

exportfs -r -a 

10 . If you want users to be able to initiate clonezilla operations remotely (which could occur 
if, for example, a user were to use the hal in their test to save or restore a state on a resource 
whose state_type is “clonezilla”), start the clonezilla XMLRPC daemon. In the root of tybase on 
the test server, run 

bin/czxmlrpcd 

If desired, configure the test server to automatically start the clonezilla XMLRPC daemon at boot 
time. For example, put an entry in /etc/rc . d/rc . local, like the following: 

(cd /proj/testing/tyworkflow && bin/czxmlrpcd) 
a . NOTE: The clonezilla XMLRPC daemon can be managed by bin/czxmlrpc_admin. 
Run with the - h option to see available functionality. 

Next, verify that the functionality necessary for netbooting is working. 
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1. First, verify tftp functionality. From another linux machine (either an existing Linux VM or a 
developer workstation), run the following (if necessary, do yum install tftp first to get the tftp 
client), replacing <TESTSERVER_IP> with the IP address or hostname of your test server: 

tftp <TESTSERVER_IP> -c get pxelinux . cf g/def ault 
When this command completes, you should find a file named "default” in the current 
directory, and it should have the same contents as what you put in the default PXE config file 
earlier. 

2. Next, verify that the default of booting to the local hard drive works properly. Simply select 
one of your existing test resources (a physical machine or virtual machine; it doesn’t matter 
which), ensure BIOS settings are such that the machine will attempt to netboot before 
booting its local hard drive, and then reboot it. You should see messages on the screen 
indicating that the machine is attempting a DHCP request, followed by some quick messages 
about PXE, and then the machine should start booting whatever OS is currently installed on 
it. 

a. If you see no signs whatsoever of a DHCP process, then the machine is likely not 
properly configured to attempt a netboot before booting the local hard drive. 

b. If you see signs of DHCP but get a failure (such as "No DHCP offers received”), then 
something is wrong with the DHCP server. Ensure it is running. 

i. Note, however, that if you have multiple network interfaces on your 

machine and the first interface is not connected to the same network as the 
DHCP server, you can expect to see the first interface fail, but then a later 
interface, which is connected to the same network as the DHCP server, 
should succeed. 

c. If the DHCP process appears to succeed, but then the TFTP process fails (for 
example, with the error “TFTP open timeout”), there is something wrong with the 
tftp server. Ensure it is running. 

d. In the case of other errors, such as tiles not being found, ensure that you copied the 
tiles to /var/lib/tf tpboot as prescribed earlier. 

i. Note that if certain tiles are missing, the PXE client in the machine’s BIOS will 
very quickly fall back to booting the local hard drive. You will need to watch 
the screen very closely to see if there are any error messages. If you miss 
errors here, it’s not the end of the world; they will become apparent later. 

3. Finally, ensure that the clonezilla netboot environment works properly. To do this, first 
obtain the MAC address of the machine you used for testing in the previous step. Create a 
tile named for the machine’s MAC address in /var/lib/tf tpboot/pxelinux . cf g, 
using dashes and lower-case letters, prepended with “01-". For example, if your machine’s 
MAC address is 00:50:56:3E:00:25, you would create the tile 

/var/lib/tf tpboot/pxelinux. cfg/01-00-50-56-3e-00-25. In this tile, 
place the following text (replacing <TESTSERVER_IP> with your test server’s IP address): 

DEFAULT CZ 

LABEL CZ 

KERNEL vmlinuz 
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APPEND initrd=initrd . img boot=live config noswap netboot=nfs 
nf sroot=<TESTSERVER_IP> : /proj/cz_netboot_root 
(Note that the last line is a single line.) 

Reboot the test machine. You should see it perform a netboot except this time, instead of 
going to the local hard drive, it should start booting a Linux kernel and eventually end up at a 
clonezilla text menu. 

a. If this fails, ensure you properly unzipped the clonezilla live zip to 

/proj /cz_netboot_root. The immediate contents of that directory should 
consist of two files (Clonezilla- Live -Version, GPL) and seven directories 
(boot, EFI, home, isolinux, live, syslinux, utils). 

b. NOTE: After finishing this test, make sure to delete the MAC-address-named config 
file you created. 

Having successfully completed the tests, we’ll finish by setting up tybase to use your clonezilla setup. To 
do this, in your tybase clone in the remote commit root directory, create the file rc/hal . rc and put 
the following in it: 

[state_clonezilla] 

netboot_root_nfs_host : <TESTSERVER_IP> 
img_root_nf s_host : <TESTSERVER_IP_OR_HOST> 

Replace <TESTSERVER_IP> with the IP address of your test server (it MUST be an IP address). 

Replace <TESTSERVER_IP_OR_HOST> with either the IP address or hostname of your test server. 

Our clonezilla module also includes support for wiping a drive before restoring an image to it, which is 
turned off by default but can be enabled in rc/hal . rc. The wipe is a simple single pass of writing 
zeroes to the drive. If you’d like this wipe to occur, then set the pre_restore_wipe option to True. 
Also, there is a timeout setting for the maximum amount of time to allow for wiping the drive. The 
default setting is 5400 seconds (1.5 hours), which based on a rough analysis, should be suitable for a 
500GB drive on a slow interface. If you want to change this timeout, set the wipe_timeout option to 
the desired number of seconds. Both these wipe-related options are set like those in the above rc file 
snippet, in the state_clonezilla section. 

3.7 Palantir Configuration (Optional) 

Tyrant provides palantir installation packages which are created when the make command is run in 
tybase. Prior to running make and installing palantir, there are certain configuration options you should 
set which will dictate how palantir is installed on resources in your range. All the settings mentioned in 
this section will be set in the palantir section of rc/palantir . rc. See 
rc/ defaults/palantir . rc for detailed explanation of all palantir configuration options. 

If you wish palantir to use nonstandard ports (the defaults are 51134 for the service instance, which runs 
as root on Linux and as SYSTEM on Windows and 51135 for the user instance [which runs as the 
currently-logged in user; only for Windows]), set the svc_port and usr_port settings. 
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The palantir installer build code includes the ability to slightly change the contents of binaries in ELF, PE, 
or Mach-0 format (the executable/shared object formats used on Linux, Windows, and OS X, 
respectively), thus changing the checksums of those files. If this behavior is desired, set the bin_salt 
option to some 32-bit integer. This integer will be written to an unused location in each executable and 
shared object file that goes into the palantir installer packages. For PE files, which have an internal 
checksum, the checksum will be regenerated using a python implementation of the PE checksum 
algorithm. 

By default, the palantir installer only deploys byte-compiled (pyc) files of our l st -party palantir code (3 rd - 
party code such as standard python modules are still deployed as normal python source tiles). 
Sometimes, for debugging purposes, it is useful to have the source python files of l st -party palantir code 
deployed also. If this is desired, set the inst_keep_src option to True. 

Assuming all of the above options were desired, you would end up with something like the following in 

rc/palantir . rc: 

[palantir] 
svc_port: 46000 
usr_port: 46001 
bin_salt: 4277009102 
inst_keep_src : True 

WARNING: There are a few important things to keep in mind when making palantir configuration 
changes: 

1. The above settings affect how the palantir installer packages are built. If you change them, you 
must re-run make in tybase in order to build new palantir installer packages. 

2. If you change the palantir ports after having already installed palantir on some resources, you 
must reinstall palantir because the previously-installed instances of palantir will be 
communicating on the old ports. 

3. If you change the palantir ports, ALL users of your range must have the same ports set in their 
palantir rc tiles, or none of their tests will be able to communicate with palantir on the test 
resources. The best way to avoid this problem is to check rc/palantir . rc into a central 
repository so that others who use the repository will receive your changes. This will be done in a 
later step. 

3.8 Reaper Setup 

As root, in your tyworkflow directory in the remote commit root directory, create the file 

rc/ reaper . rc. In this tile, we'll define global reaper server configuration and, optionally, named 

reaper configurations for each ESXi server. 

Each computer in the tyrant database has a reaper field (which you imported previously). Each named 
reaper can have a separate configuration as far as the maximum number of concurrent reverts 
happening for that reaper, and the maximum number of attempts to make reverting a given resource 
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before giving up. In addition, the reaper server can have an overall limit on the maximum number of 
reverts the server has running at once. Defining a separate reaper for each ESXi server is a convenient 
way to appropriately manage load on your ESXi servers. Each reaper corresponding to an ESXi server can 
be set for the proper number of concurrent reverts based on how powerful that ESXi server is. 

Creating the named reaper configuration sections in the reaper RC file is not necessary. Named reapers 
which lack explicit configuration in the reaper RC file will be set with default settings (currently, five 
concurrent reverts and two tries). Creating an explicit section in the reaper RC file is only necessary if 
you want to override these defaults. 

If you wish to change the reaper-server-level maximum concurrent revert setting, set the 
max_children option in the reaper section to the desired number. If not set, the same default 
setting as used for reapers which lack a reaper.rc section will be used at the server level. 

To alter settings for specific named reapers, create sections named reaper_NAME, where NAME is the 
DNS name of an ESXi server (and the same as what you put in the reaper held when creating your 
computer CSV file in the previous section). In this section, you may define two options: 

• max_child r en (maximum number of concurrent reverts on this ESXi server) 

• max_t r ies (maximum number of attempts to revert a computer before giving up). You will 
likely only want to set max_children, if even that. 

Example: 

[reaper_NAME] 
max_children = 10 

If you wish to override the settings for physical machines (which you set to use the "clonezilla” reaper 
earlier), then simply create a “reaper_clonezilla” section and set the max_children or max_tries options 
as desired, just like with the ESXi reapers above. For example: 

[ reape r_clonezilla] 
max_children = 8 
max_tries = 1 

Finally, install the Reaper as a system service. On the testing server, change into the install directory of 
the tyworkflow clone in your remote commit root directory (e.g. /proj/testing/tyworkflow/install). Run 
. /setup_service . sh reaper 

which will install the reaper init script and enable it so that the Reaper will start at boot. Run 
/etc/init . d/reaper start 
to start the Reaper immediately without having to reboot. 

3.9 Plunger Setup 

As root, in your tyworkflow directory on the Tyrant server, if you wish to change any settings from 
default or add extensions, create the file rc/ plunger . rc. In this file, set plunger configurations. The 
default configuration file, rc/ defaults/ plunger.rc describes each configurable setting. 
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To run plunger, execute the following command: 
bin/plunger -x 

You many run plunger with verbose debug output with the -v argument: 

bin/plunger -x -v 

To stop plunger, run the command: 

bin/plunge r_admin service_shutdown 

If desired, configure the test server to automatically start the plunger daemon at boot time. For 
example, put an entry in /etc/ rc . d/rc . local, like the following: 

(cd /proj/testing/tyworkflow && bin/plunger -x) 

3.9.1 Plunger Admin 

When the Plunger service is running, additional commands can be run to change configurations or 
resynchronize database time. For descriptions of these commands, run 

bin/plunger_admin -h 

3.9.2 Plunger Extensions 

If you wish to add plunger extensions, you must create an importable module with a defined function 
plungerAction( *args, **kwargs). This function must return True on successful completion, 
or raise an exception upon any error. The exception will be caught by Plunger to print out to the plunger 
log file. 

For example, the upsync leafbag has been setup for Tyrant. The following extension exists in the 
configuration file: 

[ plunge r_extensions] 
upsync = upsync . plunger . plungerAction 

The "upsync” extension imports the module upsync. plunger.plungerAction. The plungerAction module 
defines the function plungerAction() and returns True: 

def plungerAction( *args, **kwargs) : 

p = configuration . read_config( ' upsync . rc ' ) 

upsync_history_limit = p.getint( 'plunger ' , 1 upsync_history_limit ' ) if 
p . has_option( ' plunger 1 , 1 upsync_history_limit ' ) else 7 
upsync_db = DB( ) 
try: 

upsync_db . del_upsync_history ( upsync_history_limit ) 
finally: 

upsync_db . close ( ) 
return True 
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3.10 Overview Setup 

In your HTTPD document root, create a symlink which points to the src/tyworkf low/overview 
subdirectory of your tyworkflow copy in the remote commit directory made earlier. If necessary, enable 
the FollowSymLinks option on your HTTPD document root. 

At this point, navigating to the test server's / overview directory (e.g. 
http://testserver.example.com/overview) should present you with a menu of overview pages 
(Namespaces, Recipes, Resources, Management). If you click on Management, you should see a table 
listing all the resources you imported in the previous step. 

Verify that resources are set as ‘dirty’. This will ensure they are reaped properly at the beginning of the 
first tests run. 

3.HOverview Test Scheduler Setup 

These steps assume that the web server user (“apache”) has been added to the sudoers file as 
described in the pre-Tyrant setup. To complete the setup for the Test Scheduler do the following: 

• Within the /etc/cron.d/ create the empty text file “ATL-Tyrant-Cron-Scheduler” 

• Ensure “ATL-Tyrant-Cron-Scheduler” has 644 permissions and is owned by “apache” 

• Ensure “< tyworkflow root>/src/tyworkflow/overview/cron/cron_start.sh” has 755 
permissions (it should be owned by root) 

• Ensure “/etc/sysconfig/crond” includes the line: CRONDARGS=”-p” and restart crond 
(systemctl restart crond. service) 


If Test Scheduler is failing to render properly ensure the php-posix library has been installed 
(yum install php-posix) as described in the pre-Tyrant setup. 

3.12 Intermediate Step: Check In Configuration Changes 

Now that you’ve configured the reaper on the test server, check in the configuration file changes. To do 
this: 

• change directory into your tyworkflow clone 

• run hg add rc/* . rc (this causes mercurial to start tracking the rc files you created earlier) 

• run hg commit (and provide a useful commit log message) 

• run hg push. If you’ve made other local changes and you don’t want to commit those, then 
make sure to exclude them from your commit. (NOTE: This won’t work if you did not set up a 
mercurial server as prescribed in earlier steps of the manual.) 

3.13 Clone Setup 

Tyworkflow provides support for cloning resources if the HAL has support for cloning for that resource’s 
hardware type. Tyworkflow’s cloning support will automatically select new unique MAC and IP 
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addresses for the clone in order to prevent network conflicts between the clone and source. Some 
configuration is required to make this work smoothly, and one important setting differs between the 
tyworkflow instance on the test server and the users’ instances. 

Create the file rc/clone . rc in tyworkflow on the test server, and create a section within it named 
clone. Only commonly-set options are explained here; see rc/defaults/clone . rc for more 
detail. In this section, set the following options as desired: 

• dhcpd_commit_cmd: The command to use to restart the DHCP daemon on the test server. 
The defaults file provides some command examples. 

• use_arp: Set to True to have an ARP ping done for extra assurance that an IP address that is 
available according to the tyrant database is not in use elsewhere on the network. 

• ar ping: Optional path to specify the location of arping program, if it’s not present at the 
default location. Only applies if ARP is turned on. 

• ar ping_dev: Device from which the ARP ping should originate. Must be set if the test server 
has multiple network interfaces. In this case, set it to the interface which connects to the 
network the test resources are on. Only applies if ARP is turned on. 

• xmlr pc_host: Set this to the IP address or host name of the test server. When set, cloning 
operations will use the XMLRPC server rather than the local code backend. This MUST NOT be 
set on the tyworkflow instance on the test server (the one powering overview and the reaper), 
but MUST be set for user instances (e.g. a developer’s tyworkflow from which remote_commit is 
run). 

After the above options have been set properly for your environment (including setting xmlr pc_host 
appropriately for a developer), add and commit the r c/ clone . r c tile just like you did the other rc 
files in the previous section. Then, after pushing that change, unset the xmlrpc_host setting so that the 
tyworkflow instance on the test server will NOT use the clone XMLRPC daemon. 

Next, start the clone XMLRPC daemon on the test server. This daemon is necessary so that clones can be 
initiated remotely, and so that users who use remote_commit with a non-root user account are still able 
to initiate clones (since cloning involves some steps that require root privileges). In the root of 
tyworkflow, run: 

bin/clonexmlrpcd 

If desired, configure the test server to automatically start the clone XMLRPC daemon at boot time. For 
example, put an entry in / etc/rc . d/rc . local, like the following: 

(cd /proj/testing/tyworkflow && bin/clonexmlrpcd) 

The clone XMLRPC daemon can be managed by bin/clonexmlr pc_admin. Run with the - h option 
to see available functionality. 

3.13.1 Destroying Clones with a vCenter Server 

The clone feature of tyworkflow allows one to create and then later delete clones. The normal operation 
of deleting ESXi VM-based clones involves connecting directly to the ESXi host on which the clone resides 
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and deleting it from there. In a range controlled by a vCenter server, this will result in the deleted clone 
showing up as orphaned in the vCenter server’s view of VMs (i.e. when connected via the vSphere 
client). To prevent this, you can inform Tyrant of the username and password used to log in to the 
vCenter server. If this is configured, then when destroying a clone, Tyrant will first log in to the ESXi host 
the clone resides on, query the host to determine if it is controlled by a vCenter server, and then log into 
the vCenter server and perform the deletion there (which effectively deletes the clone from the ESXi 
host and does not leave an orphaned VM in the vCenter server view). To enable this feature, set the 
vc_user and vc_pass options in the esxi section of rc/hal . rc to the username and password, 
respectively. For example, you would put something like the following in rc/ clone . rc: 

[esxi] 

vc_user: Administrator 
vc_pass: admin_password 
Commit and push this change. 

3.14 Test Resource Setup 

Now we move to setting up the actual resources which will be used in automated tests and the 
infrastructure to support them. For both VMs and physical machines, this means installing the 
base OS and any other applications, installing palantir, and then snapshotting the VM. For 
physical machines, after installing the OS and all software, you’ll run a provided command which 
will save a clonezilla image of the state of the computer. If you have multiple physical machines 
of the same model, you need only build each desired recipe on one of the actual physical 
machines of that model because the clonezilla image taken from one machine is usable on all 
other machines of the same model. 

3.14.1 ESXi Virtual Machine Infrastructure 

Install and configure the ESXi servers as necessary to support the desired size of your range (instructions 
for this are beyond the scope of this document). Ensure that the ESXi servers are reachable from the test 
server and developer workstations over the network at the IP addresses or hostnames you set when you 
imported VM hosts earlier. Also, set the username and password on the servers according to what you 
set when importing VM hosts. 

If you intend to have other networks than the main network which VMs are normally connected to 
which will be used for testing, ensure that each ESXi host has a connection to that network, and that the 
name of that network is identical between each ESXi host and identical to the value in the "name" field 
of the vlan table. If test traffic (i.e. palantir traffic) is ever intended to flow over this network, then also 
ensure that any locations on which tests are run (the test server and possibly developer workstations) 
are able to connect to this network. These points about network setup are particularly important given 
the network switching functionality detailed in the user manual. 

3.14.2 PDU & Physical Machine Hardware Setup 

Physically install the PDUs which will be used to control your physical machines and the physical 
machines themselves. Set up the PDUs with network access at the IP addresses or hostnames you set 
earlier and enable SNMP interaction with the PDUs. 
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Connect each physical machine to the correct outlet (according to the pdu_OUtlet setting you 
provided for each machine when you imported it). For each machine, configure the outlet’s power cycle 
delay on the PDU. This is the amount of time the PDU will wait between powering an outlet off and 
powering it back on when a power cycle command is issued to the PDU. If the delay is too short, then 
the computer will not automatically power itself back on when power is restored, even if the BIOS 
settings are configured properly. The necessary delay is machine-specific and must be determined 
empirically. For APC PDUs, perform the following steps to set the delay on an outlet: 

1. On the Tyrant Server, telnet into the APC PDU with the command: 

telnet [ape ip address] 

2. Enter the username and password when prompted. (Unless the username and password was 
reset, it will be the default credentials for APC PDUs) 

3. After successfully logging into the APC PDU, the Control Console menu will be displayed. Enter 
“1” to select Device Manager. 

4. In Device Manager, enter “2” to select Outlet Management. 

5. In Outlet Management, enter “1” to select Outlet Control/Configuration. 

6. Press the Enter key again for input prompt. 

7. Enter the outlet number you wish to reconfigure. 

8. Enter “2” to select Configure Outlet. 

9. In Configure Outlet, enter 4 to set Reboot Duration. 

10. Enter the delay in seconds. 

11. In Configure Outlet, the “Pending” status should appear next to Accept Changes. 

12. Enter “5” to Accept Changes. If changes were successful, the “Success” status appears next to 
Accept Changes. 

13. To configure other outlets, press the Esc key twice to the Outlet Control/Configuration outlet list 
and repeat steps 6-12 for every other outlet. 

14. Finally, logoff the APC PDU by pressing the Esc key until the main Control Console screen 
appears. Then enter “4” to Logout. 

For each physical machine, configure its BIOS so that the machine will automatically turn on whenever 
power is restored to the outlet the machine is connected to. Also, configure each machine’s BIOS so that 
the machine attempts to netboot (PXE boot) before attempting to boot the local hard drive. Depending 
upon the machine, this may be as simple as including a “network boot” item in the machine’s boot order 
settings, but it may also require enabling PXE boot for the network interface in a different section of the 
BIOS menu. Without these two crucial BIOS settings, the machine will not be usable for automated tests. 

3.14.3 ESXi Virtual Machine Resource Setup 

Build out your range of virtual machines with whatever method you see fit (e.g. hand-installation, some 
sort of autobuilding system, cloning). 

Configure networking so that the test server can connect to the virtual machines. For the virtual 
machines, it is important that they keep the same IP addresses. Therefore, either configure the VMs 
with static IP addresses, or give them DHCP reservations based on MAC address so each VM always gets 
the same IP address. When running test scripts, connect to your VMs by IP address, not by DNS 
hostname, even if you have DNS set up (there are currently some parts of Tyrant which may break if DNS 
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names are used). Whichever method you choose, remember to set the IP addresses for each VM 
properly according to what you previously imported to the database. 

Install palantir on the VM (see section 3.15). 

Take a snapshot of the VM. It is this snapshot to which the reaper will revert the resource to cleanse it 
prior to running a test on it. 

Administrator note: Each VM must have a name which is unique across the entire range, as DART 
references VMs by this field. Our naming convention is to include the base OS recipe name and the last 
two octets of the MAC address. For example, a Windows XP Professional SP3 32-bit English VM whose 
MAC address ends with 01 : fe would be 01- f e-win -xppro-sp3-en-x86. The inclusion of the 
MAC address guarantees global name uniqueness as long as there are no more than 65,536 VMs in the 
range. 

Administrator note: ESXi has some optimizations that come into play when multiple similar VMs reside 
on the same ESXi host. Therefore, we recommend collocating VMs of the same or similar base OS on the 
same ESXi host. 

Administrator note: To increase user effectiveness, there are some optimizations administrators can 
make on their VMs and snapshots. You can use the Microsoft tool bginf o to include useful system 
information on the computer background. You can also ensure that the screen size is small enough that 
the whole screen is visible without scrolling. These two simple changes can greatly increase usability of 
the system. 

3.14.4 Physical Machine Resource Setup 

For each desired combination of model and recipe, build out the recipe on one instance of the computer 
model. As with VMs, configure networking so the test server can contact the machine. You will almost 
certainly want to use DHCP with physical machines so that a specific IP address is not baked into the 
image which will be created shortly (since the same image may be restored to different physical 
machines, which should have different IP addresses). 

Note: To decrease the time necessary to run tests on physical machines, we recommend building each 
recipe to use the minimum reasonable amount of hard drive space. For example, when installing 
Windows, rather than allow it to fill the entire hard drive, limit it to 10GB (or whatever is reasonable for 
the OS you’re installing). Although the clonezilla tool used to image the drive is intelligent enough not to 
store empty space, imaging a large, mostly-empty partition still takes longer than a smaller one since 
clonezilla’s tools must scan the entire partition). 

Install palantir on the machine (see section 3.15). 

Once a model/recipe combination is built, run the following command from the root of tyworkflow. This 
command uses the HAL to save the state of the computer (in this case, as a clonezilla image since you set 
the state_type field to clonezilla when you imported your physical machines previously) and 
automatically adds a resource entry to the database linking this computer to the image it automatically 
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creates. 

bin/create_resource <COMPUTER> <RECIPE> <STATE> 

where the arguments are as follows: 

• <COMPUTER>: the name of the computer (identical to what you put as the name when you 
imported the computer) 

• <RECIPE>: the name of the recipe currently on the computer (identical to one of the recipe 
names you used when you imported recipes) 

• <STATE>: the name to save the state as (similar to snapshot name for ESXi VMs). For clonezilla 
images, the state name must be unique for all states which apply to a given hardware 
type/model combination, but is not required to be globally unique (i.e. all clonezilla saved states 
for physical machines with model set to “dellpt3500” must have unique names, but a clonezilla 
saved state for a “dellpt3500” and a “dellp390” may have the same name without conflict). 

For all other physical machines of the same model, run the same create_resource command as 
above, but substituting the other computers’ names instead of the first computer’s name. This links the 
clonezilla image created by the first create_resource run to the other physical machines of the 
same model, so that the recipe you just set up is usable on all instances of the same kind of computer. 
Only the first run of create_resource will actually save the image; the subsequent runs will see that 
the given image already exists and automatically link it in the database with the subsequent computers. 

Once you’ve finished setting up all the desired model/recipe combinations, you’ll need to run a similar 
command as you ran when importing to set newly-imported resources as usable for testing: 

bin/db_admin set_computer_flag testing_use 1 Y 1 hwtype= ' phys ' 

3.15 Palantir Installation 

To install palantir on an asset, do the following: 

1 . From the root of tybase, run make. 

a . NOTE: If you intend to make palantir configuration changes (as in section 3.7), make 
those changes before you run make. 

2 . For each resource Palantir is being installed on, copy the appropriate platform-specific 

installation directory from src/tybase/palantir/installer/distro to the resource. 
The subdirectories of distro are named for the operating system and architecture they work 
on. 

3 . On each resource, with the Palantir installation directory you copied over as the current working 

directory, run the installation script. For Windows, this is setup . bat. For Linux, it's 
setup_service . sh. 

a . NOTE: Linux test resources must have zip installed (the zip and unzip packages on 
Fedora). 

4 . If some of your resources have firewalls running on them, they may block palantir. You may 
notice this immediately upon installing palantir, if the firewall is triggered by listening sockets, or 
you may not notice it until a connection attempt is made. In either case, make whatever 
changes are necessary to the firewall configuration to allow palantir to accept connections. 
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Typically, this means opening ports 51134 (and for Windows, also opening port 51135 and/or 
allowing a "pythonw.exe” process to accept incoming connections). 

5 . Verify successful Palantir installation by running the following command from the root of tybase 
(this assumes your tybase copy is on a machine which can access the resources over the 
network), where IP_ADDR is the IP address or hostname of the resource: 

bin/palantir_admin -s <IP_ADDR> service_ping 

The last line should end something like 

[INF] root 2647: service_ping( [] , {}) = True 
It should say True at the end; if it says False, then Palantir is not properly installed. If the 
resource is a Windows resource, also run the following similar command (replacing 51135 with 
your user port setting if you changed that in rc/ palantir . rc): 

bin/palantir_admin -s <IP_ADDR> -p 51135 service_ping 
It should also say True at the end. The purpose of this second command is to test the user 
instance of Palantir, which is provided on Windows. 

o If these tests fail and you see a firewall popup on your test resource, you may need to 
change the firewall configuration to allow palantir to communicate. 

3.16 Palantir Update 

If a version older than 8.0 is already installed on a resource, you must uninstall the older version and 
install the newer one. To update Palantir, do the following: 

1. Run steps 1-2 in section 3.15 above. The latest version of Palantir should be copied over to each 
resource. 

2. Stop all Palantir services on the resource on which you want to update palantir. 

• On Linux, in a terminal, run 

service palantir stop 

• On Windows, in the command line console, run 

c : \palantir_service\service . bat stop 

Open Windows Task Manager. If additional “pythonw” processes exist, select them and 
click End Process. 

An example of this is shown below. 
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3. Delete Palantir directories of the older version. 

• In Linux, these directories include: 

/palantir_service 

• In Windows, these directories include: 

c : /palantir_service 

c : /palantir_user 

Run the remaining steps 4-6 in the Palantir Installation section above to install the new version 
of Palantir. 

3.17 Range Setup Verification 

Before moving on, we'll verify the proper setup of the range. To do so, we'll use the process_plan 
and remote_commit tools to evaluate and submit a simple test plan which verifies Palantir 
connectivity to each resource and verifies that the resource actually has the recipe which the database 
says it has. The plan is set up such that it will run once on every single combination of recipe and 
computer (which, for our purposes, means every single resource in the range). 

3.17.1 Local Verification 

First, we'll perform a local (to the test server) solve operation on the plan. This will tell us how many 
combos the plan will generate. In your tyworkflow clone in the remote commit root directory, run the 
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following command: 

bin/process_plan solve overlib . preflight . verify_os_plan 
This command will potentially generate a lot of output, but at the end will say how many combos were 
generated with a line like the following: 

Plan overlib . preflight . verify_os_plan generated 50 combos. 

The number of combos generated should be equal to the number of resources shown on the overview 
Management page or in the output of bin/db_admin list_resources. If not, this indicates that 
some of the resources may not have been properly imported, or that some of the resources were left 
marked as reserved. On the overview Management page, check that none of the test resources are 
marked as reserved (yellow background) or fubar (gray background). If any are marked reserved, check 
the box next to each one and then click the “Use Testing” button on the left side. 

3.17.2 Remote Verification 

Next, we'll actually run this plan. We'll do this from the position of a developer using remote_commit, 
which will allow us to verify proper Overmind functioning, proper Reaper functioning, and proper 
Remote Commit functioning. 

• Log in to a developer's workstation. On this computer, clone tybase and tyworkflow into the 
same base directory. 

• In tyworkflow, run make as before. Also, like before, configure the db . rc file to use the 
Overmind Database located on the testing server. Run the same solve test as above from here 
to verify remote database connectivity. 

• In tyworkflow, configure remote commit to use the environment you set up on the test server. 
Copy rc/def aults/remote_commit . rc to rc/ and modify it as follows: 

o remote_commit/remote_user: the user to connect to the testing server as (via 
SSH when syncing up files or running commands; defaults to root, which is typical) 
o remote_commit/ remote_host: the hostname or IP address of your testing server 
o remote_commit/ remote_commit_dir: the path on your testing server to the 
commits directory (/proj /testing/ commits in our example). 

O For the above settings, make sure you uncomment them. That is, remove the semicolon 
from the beginning of each option line that you modify. 

• Now, remotely submit the ping plan. Run the following command from your tyworkflow copy: 

bin/remote_commit run overlib . preflight . service_ping_plan 

Several things should happen when this command is run: 

o The current state of your tybase and tyworkflow copies will be written to text files in the 
tyworkflow directory. If you set up the mercurial server to use authentication, then this 
step may prompt developers for a password. If that happens, the developers should add 
their password to their personal hgrc files (at -/ . hgrc). If they don't do so, they'll 
have to type their password multiple times whenever they submit tests. Set up the 
password in the hgrc file with the following content: 

[auth] 

testserver . prefix = http://testserver.example.com/hg/ 
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testserver . username = <USERNAME> 
testserver . password = <PASSWORD> 

O Your tybase and tyworkflow copies will be synced up to a subdirectory of the commits 
directory named for the user you're running as. If this prompts you for a password, that 
indicates that the public SSH key of the user you're running as was not properly added to 
root's . ssh/authorized_keys file. Most SSH servers are rather strict on the 
permissions of this file, so make sure the file has mode 600 if you created it new. If you 
receive an error that your commits directory doesn't exist, that likely means there is no 
subdirectory of the commits directory on the test server named for the user specified. 

0 An instance of Overmind for scheduling your tests will be started on the test server, 
running from the tyworkflow directory just synced up. 

0 The service_ping_plan test plan will be submitted to this remote Overmind instance. 
When this occurs, you should see a line of output telling you how many combos were 
generated, similar to when you ran process_plan solve. This number of combos 
should agree with the previous number of combos from your process_plan solve 
tests. 

• Finally, remotely submit the OS verification plan. Run the following command from your 
tyworkflow copy: 

bin/remote_commit run overlib . preflight . verify_os_plan 
O This plan checks all of the host operating systems against what is in the database, and it 
checks to ensure that Palantir is running properly on each hosts in a more thorough 
manner than the previous ping test. 

Once any run of remote_COmmit is successfully completed, you should see a namespace for your test 
on the overview Namespaces page (e.g. 

http://testserver.example.com/overview/test_namespaces.php). The number of test cases in the total 
column should equal the number of combos returned previously. As you refresh this page, test cases 
should complete and move to the success column. The ping test and OS verification tests are quick, so 
users ought to see some test cases complete after only a couple minutes (though physical machines will 
take significantly longer because restoring clonezilla images is slow relative to reverting ESXi snapshots). 
When the plan is complete, the count in the success column should equal the number of combos and all 
other columns should be zero (i.e. all test cases have completed successfully). Assuming this happens, 
your range is ready to go. 

If any test cases complete with other statuses, then something is wrong with the resources used in those 
test cases. Refer to the section 4.12 for common problems you may encounter, and refer to section 4.11 
for how to fix a saved state, since the process differs for physical and virtual machines. 

3.18 Final Steps 

Now that you’ve made more configuration changes (the remote commit settings made in your 
tyworkflow clone on the developer workstation), you should again check them in. This will make it 
easier for future users to utilize the overmind range you’ve set up, as they won’t need to mess with 
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configuring overmind and remote commit themselves. Like before, change directory into your 
tyworkflow clone (that you used to remote commit from) and run hg commit, then hg push. 
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4 Range Maintenance 

After setting up a range, it is sometimes necessary to perform maintenance tasks on it. This section 
explains how to perform some common maintenance tasks with either the overview web interface or 
the db_admin command-line tool. This, and other command line tools in Tyrant, have usage 
documentation available by running the tool with the “-h” option. 

Note also that the db_admin tool supports the command line switch “-j”, which causes the tool to 
print JSON-encoded output instead of output formatted nicely for a human to read. If you ever have a 
need to programmatically parse and consume db_admin output, the “-j” switch is the way to go. 

4.1 Viewing Resources 

A simple task to start with is to view the list of resources. As has already been covered, a user can see 
the list of resources in overview on either the Resources or Management page. Both show similar 
information, but Management adds some ability to perform tasks on the resources. 

On the command-line, db_admin's list_computers and list_resources subcommands allow 
viewing the computers and resources. The difference is that list_computers shows one entry for 
each computer in the range (in our case, each VM). list_resources shows each resource, so if a 
given VM has multiple named snapshots defined in the Overmind Database, then there will be multiple 
entries for that VM in the resources listing. For our purposes, there is one and only one snapshot for 
each VM, so the number of computers and resources will be identical. 

The list_resources and list_computers commands accept filters to specify what subset of 
resources/computers to show. The filter can specify values for the fields defined on a resource as 
follows: pool, ip, mac, recipe, vlan, snapshot, reaper, reserve_name and 
reserve_time fields, (all seen as column titles on the overview Management page). Also, the 
testing_use, testing_in_use, testing_dirty and testing_f ubar (only for resources) 
flags may be specified (these fields match the corresponding columns in the overview Management 
page, sans the test ing_ prefix). 

So, to show all computers in the "poolOOl” resource pool, you would run: 

bin/db_admin list_computers filter_by="pool=pool0Ol" 

To see the resource with the IP address “192.168.56.101”, use the command: 

bin/db_admin list_resources filter_by="ip=192 . 168 . 56 . 101" 

4.2 Reserving Resources 

When performing maintenance, one of the first things to do is reserve the computer you'll be working 
on. Reserving a computer prevents it from being scheduled for tests, so that it doesn't suddenly get 
used while you're working on it, and so that any changes you make to it while servicing it don't cause 
problems for developers' tests. 
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To reserve a computer in overview, navigate to the Management page (e.g. 

http://testserver.example.com/overview/add-computer.php). This page lists all the resources in the 
range and allows various tasks to performed on them. Check the box on the row for the computer you 
want to reserve, enter your name or some other sensible identifier in the “Reserver's Name” field on the 
left side of the page, and click the “Reserve” button. When the page refreshes, the resource you 
selected will have a yellow background for its row and the "use” field will say “N”. To un-reserve a 
resource, making it available for testing again, check the resource's box and click the “Use Testing” 
button. The row will lose its yellow background and “use” will say “Y”. 

With the db_admin command-line tool, you can reserve a computer with the reserve_asset 
subcommand and un-reserve it with unreserve_asset. reserve_asset takes as its first 
positional argument the reservation name to store on the computer (same as the “Reserver’s Name” 
field in overview). Following that, keyword arguments may be specified which give computer attribute 
values to match; only computers with matching values for those attributes will be reserved. Run 
bin/db_admin -h list_computers 

to see a full list of the fields which may be matched on. The unreserve_asset command takes 
these same matching keyword arguments to specify computers to unreserve. 

To reserve a computer by name, you would run (from tyworkflow): 

bin/db_admin reserve_asset my_name name=comp001 

To reserve all computers which are currently not reserved: 

bin/db_admin reserve_asset my_name testing_use=y 

To reserve all computers in a given pool: 

bin/db_admin reserve_asset my_name pool=pool001 

To unreserve all resources previously reserved with a given reservation name: 

bin/db_admin unreserve_asset reserve_name=my_name 

To unreserve all resources, simply specify no filter: 

bin/db_admin unreserve_asset 

4.3 Deleting Computers and Resources 

To delete computers from overview, navigate to the Management page, check the box for the resource(s) 
to be deleted, and click the “Remove Computer(s)” button. Overview does not have the capability to 
delete individual resources. 

With db_admin, the del_computer subcommand will delete a computer (and all snapshots defined 
for it). del_COmputer takes the name of the computer to delete as its first argument, e.g.: 
bin/db_admin del_computer comp001 

The del_snapshot subcommand deletes snapshots. Either all snapshots for a specific computer may 
be deleted or only those snapshots with a given recipe. To delete all snapshots for a named computer, 
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you'd run: 

bin/db_admin del_snapshot compOOl 

To delete only the Windows XP SP3 x86 English snapshot for that computer, you'd run: 

bin/db_admin del_snapshot compOOl win-xp-3-en_US-x86 

(assuming that is the precise name given to the Windows XP SP3 x86 English recipe). 

To delete a resource (not a computer and all of its associated snapshots), you must delete both the 
snapshot state and record. Deleting a resource is more common for physical machine resource 
management. 

To delete the snapshot state, first list the existing snapshots for the computer in tybase root. This will 
return a list of current states for the computer on the command line: 

bin/comp_admin name=test_comp list_states 

Then, delete the state for the resource you wish to delete: 

bin/comp_admin name=test_comp delete_state state_name 

Finally, in tyworkflow root, delete the snapshot for the computer resource: 

bin/db_admin del_snapshot test_comp snapshot=state_name 

4.4 Modifying Computers and Resources 

Overview provides limited capability to make changes to computers. Using the box on the left-hand side 
of the Management page, one can change the resource pool or vlan a computer is in, change the reaper 
used for it, or reset its testing_in_use (whether or not the resource is currently being used for a 
test) or testing_f ubar (whether the resource is currently broken) flags. 

On the command-line, several db_admin subcommands provide full computer/resource modification 
functionality. 

The set_computer_attr subcommand allows modifying the name, pool, ip, mac, vlan and 
reaper attributes (all the attributes which relate to a computer) as well as the extended attributes 
vm_host, pdu_host and pdu_outlet (meaning you can use set_computer_attr to change 
the database’s understanding of which VM host a VM resides on, or which PDU a physical computer is 
connected to). This command takes filters as previously explained. To change the pool of a named 
computer 

bin/db_admin set_computer_attr pool new_pool name=pool001 

The set_computer_f lag subcommand allows modifying the testing_use, testing_in_use 
and testing_dirty flags, e.g.: 

bin/db_admin set_computer_flag testing_dirty R name=comp001 
The set_resource_flag subcommand allows setting the resource flag testing_f ubar. 
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Note that for most of the attributes being set with set_COmputer_attr, you must first add the value 
to the database. This is due to how attribute values are stored in the database. We hope to simplify this 
situation as a future enhancement. For now, this complexity applies to all the computer attributes 
except reserve_name, reserve_time, and snapshot_id. For the attributes ip, mac, hwtype, 
model, pool, vlan, state_type and reaper, you can use the add_attr subcommand, as these 
attributes are simple values. For example, if you wanted to change the IP address of a computer named 
“testcomp” to the value "192.168.6.101”, you would do: 

bin/db_admin add_attr ip 192.168.6.101 

bin/db_admin set_computer_attr ip 192.168.1.101 name=testcomp 
For the vm_host and pdu_host extended attributes, you cannot use add_attr as these attributes 
actually provide linkages to the tables holding information about VM hosts, and PDUs. Therefore, you 
must first use either the add_vm_host or add_pdu subcommands to add a record for the desired VM 
host or PDU, and then use the “host” field value as the value argument to set_computer_attr. For 
example: 

bin/db_admin add_vm_host ehost01 esxi root root_pass 
bin/db_admin set_computer_attr vm_host ehost01 name=testcomp 

See bin/db_admin -h for much more detail about modifying entities in the database. 

4.5 Deleting Old Test Results 

As a range is used over time, more and more test results build up in the database and on the test 
server's disk. The db_admin tool provides the ability to delete individual test cases, test plans, or 
entire test namespaces. Test records are removed from the database, and the test output directories are 
also optionally deleted. 

To delete a testcase, identify the testcase ID (this can be seen in Overview on the detail page for a single 
test case, or the test case list page for a plan or namespace). Then, run the following command: 
bin/db_admin del_testcase <tid> 

where <tid> is the testcase ID. You will be asked to confirm that you wish to delete the testcase from 
the database, and then also asked to confirm deletion of the testcase's output directory. If you want the 
testcase to be deleted immediately, without asking for confirmation, you can add the string "+really-do- 
it" to the end of the previous command (this string also works to prevent confirmation prompting for the 
other test result deletion commands outlined in this section). Note, however, that you will always be 
prompted to confirm deletion of test output directories, regardless of what you put on the command 
line. 

To delete a test plan, identify the test plan ID (which is visible in Overview on the test plan list page for a 
namespace, as well as on the test case list page for the plan or the test case detail page for any test case 
in the plan). Then, run the following command: 

bin/db_admin del_testplan <pid> 

where <pid> is the test plan ID. You will prompted to confirm deletion of the plan from the database, 
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and for deleting the directory containing all the test case output directories on the test server. Again, 
adding "+really-do-it" will suppress prompting for confirmation for deleting the plan from the database. 

To delete an entire namespace, you can use either the name of the namespace or the namespace ID. 
These pieces of information are visible in Overview on the namespace list page, or any of the list or 
detail pages for any element (e.g. test case, test plan) in the namespace. Then, run the following 
command: 

bin/db_admin del_namespace <name_or_nid> 
where <name_or_nid> is the name or the namespace ID. As above, you will be prompted to delete the 
namespace from the database, and to delete the directory containing all its test case output directories 
from the server, and the "+really-do-it" string will suppress prompting for confirmation of deletion from 
the database. 

WARNING: Take care when using these commands, as they are destructive! A small typo on the 
command line could cause you to delete a namespace you didn't intend to delete. Also, pay close 
attention to the filesystem path(s) you are asked to confirm to delete. Though this has never happened 
in practice, in theory, if an incorrect filesystem path got into the database somehow, it could be possible 
for much more than was intended to be deleted. So, look closely at the path(s) the command tells you it 
is going to delete and make sure they make sense. 

4.6 Managing VM Hosts and PDUs 

Overview has support for viewing VM hosts and PDUs present in the database but does not provide the 
ability to modify their attributes. 

On the command-line, the db_admin subcommands list_vm_hosts and list_pdus allow you to 
see the VM hosts and PDUs in the database. These work the same as the other list_* subcommands 
as far as arguments used to modify their results. 

The del_vm_host and del_pdu subcommands let you delete VM hosts and PDUs. Both take as their 
only argument the host field of the respective entity. For example, to delete a VM host name "abdon”, 
you would run: 

bin/db_admin del_vm_host abdon 

The subcommands set_vm_host_attr and set_pdu_attr provide the ability to modify VM hosts 
and PDUs. For both entities, all fields except the ID may be modified. The complexity mentioned 
previously with set_COmputer_attr does not apply to modifying VM host and PDU information. 

For example, to change the username of a VM host whose host field is set to “abdon”, you would run 
bin/db_admin set_vm_host_attr user new_username host=abdon 
To change the host field of a PDU whose host field is currently “apcOl”, you would run 
bin/db_admin set_pdu_attr host new_hostname host=apc0l 
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4.7 Managing Database Users 

In section 3.4.5 you learned howto create database users with the add_dbuser subcommand to 
bin/db_admin. Several other subcommands are defined for working with database users. Note that 
these subcommands should only be used on database user accounts created with add_dbuser. 

The list_dbusers subcommand lists all the database users that appear to have been created by the 
add_dbuser subcommand, and shows you the username and privilege level for each such user. 
list_dbusers accepts the same arguments as other list_* subcommands. For example, to list all 
users: 

bin/db_admin list_dbusers 

To list only users with test privilege level: 

bin/db_admin list_dbusers filter_by=level=test 

The del_dbuser subcommand allows you to delete a single database user by name. For example: 
bin/db_admin del_dbuser userOl 

The set_dbuser_password allows you to change the password for a named user. Like 
add_dbuser, you may either provide the new password on the command line, or omit it and then be 
prompted to enter and confirm it. For example: 

bin/db_admin set_dbuser_password userOl my_new_pass 

The set_dbuser_privlev subcommand allows you to change the privilege level of an existing user 
by specifying the existing user’s name and the desired new level. For example: 

bin/db_admin set_dbuser_privlev userOl admin 
would elevate the user “userOl” to the administrative privilege level (unless he/she were already at that 
level, in which case this would effectively be a no-op. 

4.7.1 Warning about Database Updates 

In the future, new tables or fields may be added to the Tyrant database and delivered in an update. In 
the case of the “test” privilege level, it is likely that in conjunction with changes to the database, there 
will be changes to the set of privileges granted to a “test”-level user. Thus, in the case of a database 
update, you will need to reset privileges for any “test”-level users. To do this, simply run the 
set_dbuser_privlev subcommand for each “test”-level user, setting their privilege level to “test”. 
This will cause the existing deprecated set of privileges to be removed, and the correct set to be applied. 

4.8 Controlling Computers 

As previously mentioned, the HAL provides support for abstract computer-level operations. For 
an administrator, these operations are exposed on the command line via the comp_admin tool 
in tybase. 

4.8.1 General Usage 

General usage of the comp_admin tool is as follows (run from the root of tybase): 
bin/comp_admin <COMPUTER_ATTRS> <CMD> [<ARGS> ...] 
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where: 

• <COMPUTER_ATTRS> is a comma-separated list of key/value pairs of attributes 
identifying the computer you wish to operate over (explained below). 

• <CMD> is the name of the computer-level operation to perform. 

• <ARGS> are arguments for the computer-level operation (not all operations take 
arguments). Keyword arguments may be specified in the format KEY=VALUE. 

Unlike undermine and palantir_admin, which simply use the IP address to connect to a test 
resource via palantir, comp_admin requires more metadata about the computer it is to operate on, and 
the specific pieces of metadata required differ based on the operation being performed and the type of 
computer the operation is being performed on. For example, in order to power off an ESXi VM, 
comp_admin needs to know the name of the VM, the hostname or IP address of the ESXi host the VM 
resides on, and the username and password for that host. To power off a physical computer, 
comp_admin needs to know the hostname or IP address, username, and password of the PDU the 
computer is connected to, and the identifier of the specific outlet the computer is plugged into on that 
PDU. 

These aforementioned pieces of computer metadata are readily available in the Overmind database, 
since you provided them when you imported your resources, but comp_admin is a tybase tool. How is 
comp_admin to obtain all this information? This is where the <COMPUTER_ATTRS> argument comes 
in. <COMPUTER_ATTRS> is a comma-separated list of key/value pairs of the computer metadata. The 
valid keys are the names of computer fields (e.g. the column headers returned by bin/db_admin 
list_computers). 

Using <COMPUTER_ATTRS>, there are two ways to provide comp_admin with the information it 
needs to perform the desired operation: 

• If (and only if) tyworkflow is linked to tybase (which would be the case if you cloned tybase and 
tyworkflow as siblings in the same directory and then ran make in tyworkflow), then 
COmp_admin will use the provided attributes as a filter to find a specific computer entry in the 
database from which it can get the necessary information. So, for example, you can simply 
provide the computer name, which is unique, and comp_admin can automatically obtain the 
rest of the information it needs. If zero or more than one computer entry in the database match 
the provided computer attributes, then comp_admin will fail. 

• If tyworkflow is not linked to tybase, then all necessary pieces of information must be specified 
on the command line. 

Our expectation (and strong recommendation) is that you simply ensure the Overmind database is 
available as comp_admin will be much easier to use that way. 

As with some other Tyrant command line utilities, values which begin with numbers or contain dashes 
must be quoted and the quotes must be escaped. 
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To bring this all together, here are some examples. Suppose you have a computer named 
“098lfwsllx64” which you wish to power off, and suppose that tyworkflow is linked to tybase, so that 
the Overmind database is available. You would then run: 

bin/comp_admin name=V0981fwsllx64V power_off 
Note the escaped quotes, since the value begins with a number. If the Overmind database were not 
available, you would have to specify all the necessary computer attributes on the command line, like so 
(note this command is all on one line, but is wrapped in this document): 
bin/comp_admin 

name=\ '0981fwsllx64\ ' , vm_host=esxi01 . example . com, vm_host_user=root, vm_ 
host_password=rootpass power_off 

If you wanted to save the state of a computer named “test_comp_001”, naming the saved state 
“my_first_state”, and the Overmind database were available, you would use the following command: 

bin/comp_admin name=test_comp_00l save_state my_first_state 
If the Overmind database were not available and test_comp_001 were a VM configured to use ESXi 
snapshots, then in addition to the computer name, you would need to provide the ESXi host name, 
username and password. If instead it were a physical computer configured to use clonezilla snapshots, 
you would need to provide the hostname, username and password of the PDU it’s connected to, the 
identifier of the specific outlet it’s connected to, and its MAC address (needed by clonezilla to facilitate 
the netboot process). 

4.8.2 Controlling Power 

For controlling power to a computer, the following commands are available: 

• power_on: Turns a computer on. 

• power_off: Turns a computer off. 

• power_cycle: Cycles a computer’s power, finishing with the computer on. For VMs, if the 
computer is already on, this means a “reset” operation. In general, if the computer is off, then 
power_cycle is the same as power_on. 

• power_state: Tells you what power state the computer is in. Possible values are “on”, “off”, 
“suspended” (VMs only) and "unknown” (used for physical computers connected to a PDU, since 
the PDU only tells us whether the outlet the computer is connected to is powered, but not 
conclusively whether the computer itself is on). 

• suspend: Suspends a computer (only for VMs). 

• resume: Starts a suspended computer back up (only for VMs). 

4.8.3 Controlling Saved States 

comp_admin also provides the ability to manage saved states for computer (e.g., for VMs, snapshots or 
for physical computers, disk images of some sort). The type of saved states used for a computer are 
determined by the computer’s state_type field value. The reason for a separate field determining 
this (as opposed to just inferring the state type from the computer’s hwtype and model fields) is that 
some state types can apply to different types of computers. For example, one can use clonezilla disk 
images with physical machines or VMs (though the reverse is not true: one cannot use ESXi VM 
snapshots with physical machines). 
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Of the following commands, only the restore_State operation is guaranteed to be available. All 
other operations may or may not be supported, depending upon state control implementation. 

To restore a state, use the restore_State command and provide the name of the state to restore as 
an argument, e.g.: 

bin/comp_admin name=test_comp restore_state old_state 
For ESXi VMs only, if no state name is given as an argument, or the special state name “latest” is given, 
then the VM will be reverted to whatever its current snapshot is. 

To save a computer’s state, use the save_state command, with the name of the state as an 
argument. For example: 

bin/comp_admin name=test_comp save_state new_state 

For ESXi VMs only, the following keyword arguments are also supported: 

• memory: Boolean which determines whether or not the VM’s memory should be stored in the 
snapshot (in other words, is it a “powered on” or “powered off” snapshot?); default is True. 

• quiesce: Boolean which sets ESXi’s “quiesce” option; default is False. 

To delete a state, use the delete_State command with the name of the state to delete as an 
argument, e.g.: 

bin/comp_admin name=test_comp delete_state old_state 
For ESXi VMs only, the keyword argument delete_children is supported. This Boolean argument 
determines whether only the given snapshot, or the given snapshot plus all its children in the snapshot 
tree should be deleted. Default is False (only delete the given snapshot). 

Note that the delete_State operation has different effect for different state control 
implementations. For ESXi VM snapshots, this deletes a snapshot from a specific VM. For clonezilla 
states, this deletes a clonezilla disk image from the image storage directory. If other computers also 
depend upon that disk image, it will not be available for them either. This difference is due to the fact 
that VM snapshots are inherently tied to one and only one VM, whereas clonezilla disk images may be 
applicable to multiple computers. 

To see what states are available for a computer, use list_states, e.g.: 

bin/comp_admin name=test_comp list_states 
To see whether a computer has a specific saved state available to it, use has_state with the name of 
the desired state, e.g.: 

bin/comp_admin name=test_comp has_state some_state 
To see the current state of a computer (currently only applicable to ESXi VMs), use cur rent_state, 
e.g.: 

bin/comp_admin name=test_comp current_state 

To rename a saved state, use rename_state, providing it with the old and new names, e.g.: 
bin/comp_admin name=test_comp old_state new_state 
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4.8.4 Advisory Notes 

There is nothing in comp_admin which will prevent you from doing something that would mess up 
somebody else’s test. For example, if a computer were in the middle of running an automated test and 
you were to run the power_of f operation against it, the computer would power off in the middle of a 
test, thus invalidating that test. To avoid this, check the Overview Management page to see what 
computers are in use and reserve a specific computer before you work with it. 

Operations run in comp_admin do not make any changes to the Overmind database. Thus, certain 
comp_admin operations can delete things which an Overmind test range depends upon. For example, 
you could, using comp_admin, delete a clonezilla disk image that multiple physical machine resources 
depend upon, meaning that future tests run through Overmind that need to use that disk image would 
fail. 

Operations involving clonezilla saved states currently only work when run on the test server as root (we 
have plans to improve this situation in a future release). That is, you cannot manage clonezilla saved 
states by running comp_admin on, for example, a developer workstation, nor by running as regular user 
even when on the test server. 

4.9 Purging Tests 

At times you may find it necessary to kill or “purge” some running tests. One reason this could occur is if 
you have to reserve some VMs to do maintenance on them, but some tests have already been 
committed which depend upon those VMs. In that case, the tests will never be able to run because a 
resource the test scheduler thought would be available to the tests is no longer available. 

To kill the tests, you first need to determine what user name the tests were submitted under. The 
username is the string before the first underscore of a test namespace name (as seen on the overview 
Namespaces page) and should match one of the subdirectories in the commits directory on the test 
server. For example, given a namespace name “jdoe_Sep26_131054_someplan”, the commits directory 
the test is running out of would be /proj/testing/commits/jdoe, following our previous 
examples. You can use this username jdoe in conjunction with remote_commit’s purge 
subcommand to purge tests running out of that commits directory. This command can be invoked from 
any tyworkflow clone on a system which has network access to the test server; you do not need to be in 
the user’s commits subdirectory on the test server to run this command since remote commit runs the 
commands remotely. 

To purge all tests running out of a user’s directory, run: 
bin/remote_commit -u jdoe purge 

If you only wish to purge tests running in a specific namespace, obtain the namespace ID ("nid”) from 
the overview Namespaces page and specify it with the nid keyword like so: 

bin/remote_commit -u jdoe purge nid=1234 
(Usually, there will only be one namespace running from a given commits subdirectory, though it is 
possible to have multiple). To purge only a specific test plan, get the plan ID (“pid”) from overview (from 
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the Namespace page, click on the namespace the test is running in, then look in the “pid” column), and 
use the pid keyword: 

bin/remote_commit -u jdoe purge pid=1234 
Finally, one can purge only a single test case with the testcase ID (“tid”), obtained from overview by 
clicking on a plan name and noting the "tid” column value, then using the tid keyword: 
bin/remote_commit -u jdoe purge tid=1234 

4.10 Restarting & Shutting Down Remote Overmind Instances 

Occasionally you may find it necessary to restart or shut down a user’s remote Overmind instance. When 
remote commit is used, an instance of Overmind runs out of a user’s commits subdirectory. When you 
know the name of the user’s commits subdirectory (for example, continuing from the previous section, 
“jdoe”), you can use the - u option as before with the stop subcommand, like so: 

bin/remote_commit -u jdoe stop 
You can also restart the remote overmind instance with the restart subcommand. 
bin/remote_commit -u jdoe restart 

In the rare case in which you need to manually kill a user’s Overmind instance, you can determine the 
PID of the Overmind instance by looking for running processes containing the words “overmind” and the 
username. For example, to get the Overmind running out of jdoe2’s commits directory, run: 

ps aux | grep jdoe2 | grep overmind | grep python 

In the resulting list, you should see a process containing 

python -m tyworkflow. overmind . main 

and whose path also contains that user’s commits subdirectory (e.g. 

/proj /testing/commits/j doe2). That is the process you want to kill. 

4.11Fixing Broken Saved States 

Sometimes you may find that a test resource has a problem which is baked in to the saved state being 
used with the resource (e.g. an ESXi VM snapshot has a bad palantir install, or a clonezilla image has a 
problematic OS setting). This section details how to apply fixes to saved states, since the process differs 
between ESXi VMs and physical machines using clonezilla images. 

4.11.1 ESXi VM Snapshots 

The process for fixing broken ESXi snapshots is relatively uninvolved and can be performed entirely 
within the vSphere client: 

• Revert the VM to the broken state. 

• Make whatever changes are needed to fix the problem (e.g. reinstall palantir). 

• Take a new snapshot of the VM. 

• Delete the broken snapshot (not strictly necessary, but advised to avoid running out of 
disk space on your ESXi server). 

Because your ESXi test resources are all set to use the special “latest” snapshot, no changes to the 
database are necessary. If you did have ESXi test resources with named snapshots, you could get around 
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modifying the database by renaming the broken snapshot to some other name, and creating the new, 
fixed snapshot with the same name the broken snapshot originally had. 

4.11.2 Clonezilla Images 

The process for fixing broken clonezilla images is more complicated due to the fact that a given clonezilla 
image can apply to multiple physical computers. 

• Restore the broken state to one of the physical machines of the correct model. This can be done 
using the comp_admin tool run from the root of tybase: 

bin/comp_admin name=<COMPUTER> restore_state <STATE> 
where <COMPUTER> is the name of the computer (same as when you imported) and 
<STATE> is the name of the broken state. 

• On the computer you restored to, make whatever changes are needed to fix the problem. 

• Rename the broken state using comp_admin. Doing this will allow you to save the new, fixed 
state with the same name as the original broken state, so you will not have to make any changes 
to the database. 

bin/comp_admin name=<COMPUTER> rename_state <OLD_STATE> <NEW_STATE> 
where <OLD_STATE> is the original name of the broken state and <NEW_STATE> is some 
new name (perhaps just the old with "-BROKEN” appended). 

• Use thecreate_resource tool from the root of tyworkflow to take a new image of this 
computer. 

bin/create_resource <COMPUTER> <RECIPE> <STATE> 
where <COMPUTER> is the same as above, <RECIPE> is the name of the recipe in the saved 
state which you fixed, and <STATE> is the original name of the broken state. 

• Use comp_admin to delete the broken state from the clonezilla image storage directory (not 
strictly necessary, but advised to save disk space): 

bin/comp_admin delete_state name=<COMPUTER> delete_state <STATE> 
where <STATE> is now the new name of the broken state (the <NEW_STATE> argument you 
gave to the rename_state subcommand earlier). 

4.12 Cleaning Up After Terminated Clone Operations 

Although Tyrant clone support is able to gracefully clean up if an error is encountered during cloning, it 
lacks the capability to gracefully clean up if terminated (e.g. with a SIGKILL or SIGTERM or a Ctrl+C on the 
command line). This means that if a running clone_admin create process is killed, or an Overmind 
server is killed in the middle of running one or more test-result-based clone post-test operations, the 
range can be left in an inconsistent state. The range will still function properly (e.g. users will be able to 
submit tests), but there will be artifacts left over that should have been cleaned up. 

Thus, we provide a set of steps to perform to clean up after a clone process that was terminated. How 
many of these steps you will need to perform depends upon how far along the clone process was at the 
time it was terminated. We readily acknowledge that manually cleaning up after a terminated clone 
process is complicated. For this reason, we strongly recommend that you not terminate a clone process, 
or terminate a running Overmind server in the middle of running post-test clone operations. 
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• First, determine the name of the source and clone computers. You can do this by checking 
clone.log (for clones initated with bin/ clone_admin run from the test server’s tyworkflow 
clone or clones initiated from Overview), or clone_xmlrpcd . log (for clones initiated by 
overmind as part of the post-test clone feature). In both cases, the log file will be located in the 
root of the test server’s tyworkflow clone (e.g. /proj/testing/tyworkflow). Make note 
of the time at which you terminated the clone operation(s) to help you look in the right place in 
the log file. 

• Determine the MAC and IP address which were chosen for the clone. If there is an entry in the 
database for the clone, you can simply look on Overview. Otherwise, you will need to look in the 
same log file as the previous point. 

• If the clone has a resource entry in the database, delete it. This can be done via the 
Management page of Overview, or via the del_snapshot and del_computer 
subcommands to bin/db_admin. If using bin/db_admin, you must run del_snapshot 
first, followed by del_COmputer. For both, provide only the computer name as argument. 

For example: 

bin/db_admin del_snapshot cloneOl 
bin/db_admin del_computer cloneOl 

• Delete the clone’s IP and MAC addresses from the database. This must be done using two calls 
of the del_attr subcommand to bin/db_admin, like so: 

bin/db_admin del_attr ip <CL0NE_IP> 
bin/db_admin del_attr mac <CLONE_MAC> 
where <CLONE_IP> and <CLONE_MAC> are the IP and MAC addresses selected for the clone. 

• If the source computer is reserved, unreserve it. If it is suspended, resume it (e.g. via the 
vSphere Client). 

• Check the vSphere client to see if the clone exists as a VM. If so, right-click it and choose “Delete 
from Disk”. 

• Browse the ESXi datastore on which the source resides to see if there is a directory named for 
the clone. If so, delete the clone directory and its contents. 

4.13 Debugging and Troubleshooting 

4.13.1 Palantir Connection Errors 

• Check that palantir is actually running on the resource you’re trying to connect to. In Linux, run 
“ps aux | grep palantir”. You should see a process running “python -m tybase. palantir”. In 
Windows, check the Task Manager process list for two "pythonw.exe” processes, one running as 
SYSTEM, and one running as the currently-logged-in user (if no user is currently logged in, then 
there will only be the SYSTEM instance). If it’s not running, then 

o for Windows 

■ start the system instance by running net start palantir 

■ start the user instance by logging out and back in 
o for Linux: run /etc/init . d/palantir start 
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• Check that any firewall on the system is not blocking palantir connections (ports 51134 and 
51135 [the latter only on Windows]), or blocking the palantir process from using the network 
(process running pythonw.exe). 

• Check that you can connect to the network on the test resource at all. Can you ping the test 
resource from your workstation? Can the test resource ping your workstation? 

4.13.2 Overmind/Overview/Reaper Issues 

4.13.2.1 Problems Connecting to the Database 

• From the machine on which overmind/overview/reaper are running, can you log in to the 
MySQL server specified in the tyworkflow’s db.rc file using the username and password specified 
there? If not, check your MySQL server permissions to make sure the user you’ve specified in 
db.rc is able to log in remotely and create and delete databases. 

• Isa firewall enabled on the server running MySQL that’s preventing remote connections to the 
MySQL server? 

• Is the database server running? If not, start it and ensure it’s set to start on boot. 

• Is iptables configured to allow connections to the mysql server (default port 3306)? 

4.13.2.2 Errors When Viewing Overview 

• Check the apache httpd error log (by default, /etc/httpd/er ror_log) and address any 
issues presented therein. 

• Ensure httpd is running and is set to run at boot. 

• Ensure httpd is configured to follow symlinks (otherwise, httpd will refuse to follow the symlink 
and will give you an access denied error instead). 

• Ensure iptables, if running, is configured to allow incoming connections on port 80. 

4.13.2.3 General Issues 

• First, check the logs. The logs for the reaper instance in charge of reverting resources is located 
in the tyworkflow clone in the test server's remote commit environment (e.g. 

/proj /testing/tyworkf low), named reaper . log. If there are errors connecting to an 
ESXi server, they will show up here (and as a symptom, you'll see tests complete with the 
“skipped” result due to being unable to set a resource to a clean state before running). 

• Each developer's remote Overmind instance has a log named overmind.log in tyworkflow in the 
developer's commit directory (e.g. 

/proj /testing/commits/ j doe/tyworkf low/overmind . log). This file will show 
problems with Overmind scheduling. 

• If test cases take a long time to complete and result in the skipped status and the message 
“host(s) not reaped”, that indicates a problem with the Reaper. Check to make sure the Reaper 
is running. From the tyworkflow clone in the remote commit root directory on the test server, 
the command 

bin/reaper_admin service_ping 

should return True. If it doesn’t, the Reaper isn’t running. Also, check the reaper . log in 
that same directory for any errors. Ensure that the login information placed in rc/reaper . rc 
for each ESXi server is correct. 
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• If resources show up on the Resources or Management overview pages as having “fubar” status, 
this usually means an error occurred reverting the machine to a clean state. This can happen if 
the reaper is not working properly. Check to make sure the reaper is running (e.g. by running 
/etc/init . d/reaper status or systemctl status reaper). Also, check the 
reaper log (reaper . log in the tyworkflow directory on the test server which reaper is running 
out of, e.g. /proj/testing/tyworkflow) for errors. If there are errors in the reaper 
configuration, you’ll see something about it here. 

0 If it is simply due to a reaper problem, then after fixing the reaper, you can make the 
machines available again by checking the box next to each one on the overview 
Management page and then clicking the “Reset FUBAR” button on the left side. 

• Another reason reverting could fail, and machines could be marked fubar, is if you forgot to 
snapshot your VMs before using them (i.e. if the VM has no snapshot to revert to, then reverting 
will fail). Make snapshots of the failed machines, then reset them as above, by checking them 
and then clicking “Reset FUBAR”. 

• When using the 'remote commit’ functionality, time differences between the workstation 
submitting the tests and the test server will cause tests to fail. The supported configuration is to 
have all machines times synchronized via an NTP server. 

4.13.3 Test Script Issues 

• Check the script and undermine logs for an individual test. These can reveal some range setup 
problems, as well as problems unrelated to range setup. To see the logs for a specific test case, 
start at the Namespace overview page, click on the name of the namespace the test is in, then 
the name of the plan in that namespace (there will usually be only one per namespace), then 
the specific test case, and then the script . log or undermine . log links in the “Undermine 
Logs” section. 

0 If Palantir installation to a resource doesn't complete properly, this will show up in the 
undermine log as an inability to connect to the resource. The log will contain repeated 
attempts to connect, and the test will eventually end with a timeout). 

• If you receive an error status with the message “.is_clean failed”, this indicates that the resource 
the test attempted to run on is marked as dirty. When Overmind runs undermine to perform a 
test, it instructs undermine to verify that the machine is not dirty (i.e. not having previously run 
a test without being reaped) by checking for a marker on the resource. If you receive this error, 
then either the Reaper is not working properly and falsely returning success, or an operation was 
performed on an asset which incorrectly left a dirty marker. If you know the asset is clean, you 
can manually remove the dirty marker by deleting the dirty_marker . xxx directory from 
the installed Palantir directory on the resource (C : \palantir_service for Windows, 
/palantir_service for Linux). In this case, you’ll likely want to take a new snapshot since 
the existing snapshot was probably taken with the dirty marker on the box. 

4.13.4 General Issues 

• If on a Linux system, ensure selinux is turned off. 

• Have you run make in tybase and tyworkflow? 
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5 Tyrant Metrics 

5.1 Historical Resource and Recipe Usage 

These metrics show the percentage of time a resource or recipe was used in test cases for a specified 
period of time. Additionally, the metric results display the average, minimum, and maximum run times 
for a resource. 

This metric can allow administrators to determine resources which have been most in demand. For 
example: “resource 00-04-win-xp-l-x86-en has been used in 50% of scheduled tests within the past 30 
days” or “resources with recipe windows-xp-l-x86-en have been used in 62% of scheduled tests within 
the past eight hours”. 

To view historical resource and recipe usage, navigate to metrics_resource_usage.php 
(Metrics D Computer/Recipe Usage Metrics) 


Computer/Recipe Usage Metrics 




Select a start date and time and an end date and time. The metric results include the tests run within the 
start and end dates. The usage span is calculated by the difference between the start and end dates. The 
default end date is the current date and time. The default start date is one week before the current date 
and time. 

Select the metric mode to view results. The Resource mode calculates usage according to the resource 
(computer and recipe combination). The Recipe mode calculates usage grouped by recipe attributes. For 
example, in Recipe Mode, when “os” is selected, the usage results is the total non-overlapping time all 
resources with the same “os” attribute in test over the total selected period of time. Furthermore, when 
“os” AND "arch” is selected, the usage results is the total non-overlapping time all resources with the 
same “os” and “arch” over the total selected period of time. 

When the Recipe Mode is selected, the Recipe Attribute list appears. You must select one or more 
attributes. To select multiple attributes, hold the CTRL key and click the desired item. 
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Enter the limit and the filter string to view usage metrics for only specific resources or recipe attributes. 

5.2 Test Solving Contentions 

The Overmind server consists of a main processing loop which, among other things, schedules tests. At 
each iteration of this loop, the server attempts to schedule all the combos for a single plan, up to the 
specified samples limit. For each combo, the server attempts to "solve” the combo, which is our term 
for the process of finding a set of resources which match the host constraints in the combo. At any given 
time, some combos may fail to solve. If this occurs, we record a "contention hit” against the resource 
attributes the combo requires. For example, if a combo requires a 32-bit Windows host, and no 32-bit 
Windows hosts are available, we would record a contention hit for the 32-bit architecture and a 
contention hit for the Windows family. Over time, the host attributes which are most frequently needed 
but not available will rise to the top (due to having a greater contention hit count), giving range 
administrators an idea of what kinds of resources they should add to their range in order to increase test 
throughput. 
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To view contention metrics, navigate to metrics_contentions.php (Metrics a Contentions Metrics) 


Contention Metrics 



The Time-Based Plots section shows contention hits for a specific attribute name and value. 

Select the start date to vary the range in the contentions plot graph. The default start date is a few days 
before the current date and time. 

Select the small interval to display the interval in minutes between contention hit counts. NOTE: If the 
administrator choses to record contention hit counts larger than the small interval, there may be gaps in 
hit counts. 

Select the attribute name and then attribute value. Selecting the attribute name populates the values 
selection list. If contention hits never occurred for an attribute, it will not appear in the list. 
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The Total Contention Hits section shows the total contention hits for attribute values grouped by 
attribute name. 


5.3 Computer Reservation History and Metrics 

Resources are reserved at the computer level. A computer (and therefore any resources which involve 
that computer) may be reserved by a user at any time. 


To view all records of computer reservation history, navigate to reservation_history.php (Managed 
Resources D Computer Reservation History) 


Computer Reservation History 



To view metrics from computer reservation history, navigate to reservation_history_metrics.php (Metrics 
a Reservation History Metrics) 

The reservation time span metric allows the administrator to know how often and how long a resource 
has been manually reserved within a given period of time. This metric indicates to the administrator the 
demand for a resource outside the scope of Overmind as well as its unavailability to other users and 
tests. Because Overmind reservations are made at a computer level (not resource level) there will be 
only a Computer view for this metric. 


Computer Reservation History Metrics 




Select a start and end date and time to change the time span to calculate reservation history metrics. 
Enter a limit number and filter string to show only specific computers. 

The reservation history metrics can be grouped by Computer ID, Reserve Name, or both. 
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6 Appendix A - ESXi Physical Hardware to Range Size 
Charts 


6.1 Test Range Size: 50-100 Nodes 


Test Server (e.g. R310) 

1 ESXi Server (e.g. R510) 

• 1 TB Disk Space 

• 2 TB Disk Space 

• 24 GB RAM 

• 64 GB RAM 

• 3 GigE NIC 

• 3 GigE NIC 

• 2 Intel Xeon Quad-core (e.g. X3400 series) 

• ESX Standard License (2 CPU per server) 


• 2 Intel Xeon Quad-core (e.g. E5500 series) 


6.2 Test Range Size: 100-500 Nodes 


Test Server (e.g. R610) 

• 3 TB Disk Space 

• 32 GB RAM 

• 3 GigE NIC 

• 2 Intel Xeon Quad-core (e.g. E5600 series) 

VCenter Server for VM Management (e.g. R310) 

• 500 GB Disk Space 

• 24 GB RAM 

• 1 GigE NIC 

• vCenter Standard License 

• 2 Intel Xeon Quad-core (e.g. X3400 series) 

2-6 ESXi Server (e.g. R510) 

• 2 TB Disk Space 

• 64 GB RAM 

• 3 GigE NIC 

• ESX Standard License (2 CPU per server) 

• 2 Intel Xeon Quad-core (e.g. E5500 series) 



6.3 Test Range Size: 500-1000 Nodes 


Test Server (e.g. R610) 

• 4 TB Disk Space 

• 64 GB RAM 

• 4 GigE NIC 

• 2 Intel Xeon Quad-core (e.g. E5600 series) 

VCenter Server for VM Management (e.g. R310) 

• Enterprise-Class Database Server (e.g. Oracle) 

• 500 GB Disk Space 

• 32 GB RAM 

• 1 GigE NIC 

• vCenter Standard License 

• 2 Intel Xeon Quad-core (e.g. X3400 series) 

ESXi Servers: (high-end boxes host more 

VMs; low-end boxes host less; use a mix) 

10-20 Low End (e.g. R510) 

6-12 High End (e.g. R710) 

• 2 TB Disk Space 

• 3 TB Disk Space 

• 64 GB RAM 

• 128 GB RAM 

• 3 GigE NIC 

• 3 GigE NIC 

• ESX Standard License (2 CPU per server) 

• ESX Standard License (2 CPU per server) 

• 2 Intel Xeon Quad-core (e.g. E5500 series) 

• 2 Intel Xeon Quad-core (e.g. X5600 series) 


6.4 Test Range Size: 1000-5000 Nodes 


Test Server (e.g. R610) 

VCenter Server for VM Management (e.g. R310) 

• 10 TB Disk Space 

• Enterprise-Class Database Server (e.g. Oracle) 

• 64 GB RAM 

• 1.5 TB Disk Space 

• 4 GigE NIC 

• 48 GB RAM 

• 2 Intel Xeon Quad-core (e.g. E5600 series) 

• 1 GigE NIC 


• vCenter Standard License 


• 2 Intel Xeon Quad-core (e.g. X3400 series) 
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12-24 ESXi Servers (e.g. R720) 


• 3 TB Disk Space 


• 128 GB RAM 


• 3 GigE NIC 


• ESX Standard License (2 CPU per server) 


• 2 Intel Xeon Six-core (e.g. E5-2600 series) 
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7 Appendix B - Detailed Repository Layouts 

7.1 tybase 

• bin: Shell scripts used to run Tyrant components present in tybase 

• docs: Some documentation which is superseded by this manual 

• leafbags: Directory in which collections of test scripts and plans are linked in, making them 
available to be run by undermine or overmind 

• media: Directory in which third-party supporting media is included or linked in 

0 lib_esxi-0.1: library which allows controlling ESXi servers via the vSphere API 

• PythonLocal: Built-in python distribution used by Tyrant. This directory is only present after 
running make. 

• rc: Configuration files for components in tybase. 

0 defaults: Default settings which are checked into the repository; these are overridden by 
settings in the files directly in rc. 

• src: Source code for Tyrant components in tybase. This directory is present on the python path 
when running any Tyrant components. 

0 leafbag: A collection of built-in leafnodes. 

0 tybase 

■ hal: Source code for the HAL component, used to perform computer-level 
operations on test resources 

■ palantir: Source code for the palantir component, used to run operations on test 
resources 

• installer: The code used to install palantir on test resources, as well as a 
pre-built python for the OS/architecture combinations supported by 
tybase. 

■ support: Supporting modules used by various parts of Tyrant. 

■ undermine: Source code for parsing and running leafnodes. 

• test: A collection of regression tests for tybase components. 

7.2 tyworkflow 

The tyworkflow repository is structured similarly to tybase. The list below highlights the differences. 

• install: Code used to install overmind and reaper as a system service. 

• src 

0 leafbag: Built-in leafbag containing test scripts and plans for verifying a range is properly 
set up. 

0 tyworkflow 

■ overmind: Source code for overmind, the component which schedules tests 
across shared resources. 

■ overview: Source code for the web gui used to see test results and manage a 
range. 
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■ overview_httpd: Built-in web server for running overview in small environments 
(e.g. on a laptop while traveling). 

■ reaper: Source code for the reaper component, which handles sanitizing 
resources prior to a test. 

■ resource_manager: Source code for the component which tracks the state of 
test resources using a database. 

■ support: Supporting code for various components of tyworkflow. 

7.3 tyutils/event jdetection 

• config: TYUTILS configuration files. 

0 defaults: Default settings for tyutils event detection; overridden by the files directly in 
config. 

• event_detectors: Modules implementing the two types of event detection. 

• harness: A leafnode which can be used to wrap your own leafnode with event detection logic, as 
well as some code for testing event detection. 

• util: Common utility functions used by event detectors and harness scripts. 

7.4 PIL-* 

These repositories each contain a PIL subdirectory in which reside the Python Imaging Library code and 
compiled components. 

7.5 upsync 

• bin: scripts used to run the tools included in upsync 

• leafbag 

0 upsync 

■ db_admin: code for interacting with upsync-specific information in the overmind 
resource database 

■ exporter: code for the low-side resource export tool 

■ importer: code for the high-side resource import tool 

■ updater: code for the low-side automatic update tool 

7.6 upsync_apps 

• leafbag 

0 upsync_apps: container for application support packages; each subdirectory provides 
automatic internet-connected update support for a specific app 

■ avira_IS_2012: support for Avira Internet Security 2012 

■ COMMON: place for common code shared among app support packages 

■ kaspersky_IS_2012: support for Kaspersky Internet Security 2012 

■ mcafee_VSent_8_8: support for McAfee VirusScan Enterprise 8.8 

■ windows: support for updating the Windows OS 
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8 Appendix C - Event Detection Setup 

Via an add-on repository called "tyutils”, Tyrant provides the ability to run arbitrary test scripts in a 
wrapper which monitors the video output of a test resource for screen changes. This works by 
periodically taking screenshots according to a configurable interval and then comparing the screenshots 
over time to see what changes occur. Event Detection libraries were extracted from the “magnum” add- 
on repository. One can run event detection without “magnum”. 

Tyutils includes the event_detection library which provides two methods of acquiring screenshots: using 
native Windows functionality to take screenshots (which only works for Windows resources and can be 
affected by conditions on the resource being tested, but can work on VMs and physical machines alike) 
and using ESXi screenshot functionality (which only works for VMs, but works for all OS families and is 
unaffected by conditions on the resource being tested). Here, we cover how to setup and verify ESXi- 
based event detection. 

8.1 Assumptions 

This appendix makes the following assumptions: 

• All of the ESXi servers in your range have an account with the same username and password 
which has privileges to take screenshots of VMs. 

• Your test server uses NFSv3 (if it uses NFSv4, you will need to look up instructions on how to 
configure it). 

8.2 Range Setup 

First, on the test server, export (e.g. “share”) a directory via NFS. Following convention in this document, 
we’ll use /proj/testing/tyrantshare. 

• Create the directory /proj/testing/tyrantshare. 

• Add the following line to /etc/exports (create that file if it doesn’t exist): 

/proj/testing/tyrantshare * ( rw, sync, no_root_squash) 

• Runexportfs -a. 

• Check if nfs is running with one of the following commands (depending upon Linux distro in use): 

/etc/init . d/nfs status 
systemctl status nfs 
If it’s not running, then start it with one of the following: 

/etc/init . d/nfs start 
systemctl start nfs 

• If nfs isn’t set to start at boot, set it to do so with one of the following: 

chkconfig --levels 2345 nfs on 
systemctl enable nfs 

• If iptables is running, ensure nfs traffic is allowed through. 
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Next, on each ESXi server in the range, add this NFS share as a datastore. In brief, this is done by 
connecting to each ESXi server (or a central vCenter server) with the vSphere client, going to the 
Configuration tab for each ESXi server, choosing the Storage section, choosing to Add Storage, selecting 
the NFS datastore type, entering the hostname or IP address of the test server and entering 
/proj /testing/tyrantshare as the directory path. Name the datastore “tyrantshare”. 

8.3 Verification 

On a developer workstation (perhaps at the same location you used when verifying the initial range 
setup in the main part of this manual), we’ll run a test of event detection to verify it’s working. 

• In the same directory where your tyworkflow and tybase clones are, clone the tyutils repository 

(hg clone http : //testserver . example . com : 8000/tyutils). 

• In that same directory, also clone the provided PIL (Python Imaging Library) repository matching 
the architecture of the test server. For example, if your test server is running 32-bit Linux, 
choose the PIL-linux-i686 repository, but if it’s running 64-bit Linux, use PIL-linux- 
x86_64. This library is used to compare screenshots to find changes. 

• create the file leaf bag/tyutils/event_detection/conf ig/main . conf and set 
the following settings in it. Reference 

leaf bag/tyutils/event_detection/conf ig/def aults/main . conf for more 
details and example syntax: 

0 In the tester section: 

■ Set esxi_evdet_ds_name to the name of the NFS datastore you created in 
the previous section (e.g. tyrantshare). 

■ Set esxi_evdet_local_ds to the path on the test server of the directory 
what was exported via NFS in the previous section (e.g. 

/proj /testing/tyrantshare). 

• In your tyutils clone, edit the tile 

leafbags/tyutils/event_detection/harness/test/evdet_test_plan . py. 
In the paramslots variable, change USERNAME and PASSWORD to the username and 
password used to log in to each ESXi to take screenshots. Also in this file, remove the line under 
paramslots which sets esxi_host. 

• Change directory to your tybase clone and run the following commands to link in tyutils: 

In -s ../../tyutils media/tyutils 

In -s . ./. ./PIL_REPO leafbags/PIL (where PI L_REPO is whichever PIL 
repository you cloned) 

• In your tyworkflow clone, run media/tybase/media/tyutils/leafbag/makejinks.sh. 

• Change directory to your tyworkflow clone and use remote commit to run the 

evdet_test_plan: 

bin/remote_commit run 

tyutils . event_detection . harness . test . evdet_test_plan 
By default, this test plan will only run one combo. To have it run more, add the samples 
keyword argument on the end with the desired number of samples (e.g. samples=5). 
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• Browse to the overview Namespaces page (e.g. 

http://testserver.example.com/overview/test_namespaces.php) in a web browser and watch for 
your test(s) to complete. When a test completes, navigate to its testcase detail page (i.e. click on 
the namespace name, then on the plan name, then on the testcase name). You should see a few 
screenshots, one of which should show Internet Explorer. If this is the case, event detection is 
working properly. 

8.4 Finishing Up 

Now that you’ve developed a working tyutils configuration for the range, commit this change to the 
tyutils repository so others can get it automatically. From the root of your tyutils clone, run: 
hg commit -m 'validated tyutils configuration' 
hg push 
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9 Appendix D - Automatic Update and Resource 
Export/Import (Upsync) 

The upsync system provides the ability to automatically update software on low-side, internet-connected 
resources (currently only virtual machines) and then move those resources over to a high-side range 
where they can be used for testing. Upsync functionality is broken into two areas: automatic update 
and resource export/import. 


9.1 Upsync Setup and Administration 

A typical upsync setup consists of two separate Overmind-controlled ranges: a low-side, internet- 
connected range on which automatic updates are performed and from which resources are exported; 
and a high-side testing range into which the low-side-exported resources are imported and on which 
testing is performed. 



IMPORTANT: See the separate Upsync documentation for complete details on how to set up and 
administer an upsync system. 

9.2 Upsync Usage 

(NOTE: This section serves as a quick reference guide for routine upsync usage features. Additional 
options, detail, and explanation is included in the separate Upsync documentation.) 

9.2.1 Low Side Updating 

Automatic updates against the low-side range will be run via Overmind with the autoplan tool . 
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To run the update_f or_export leafnode against all resources in the range via remote_commit, 
run the following command from the root of tyworkflow on the developer workstation: 

bin/remote_commit autorun upsync . updater . update_for_export -H '' samples=-l 

The quoted empty string argument to - H means no filter for that host slot (i.e. match all resources). 
Note that certain field values require special quoting. See the separate user manual for more 
information. 

9.2.2 Low-Side Exporting 

After the low side has been set up and resources have been updated, you can perform an export of 
resources on an Overmind range by mounting the transfer drive, running the export command line 
tool, and then unmounting the transfer drive in preparation for moving it to the high side. 

After mounting the transfer drive, use the upsync export tool to identify and stage resources that are 
destined for the high side. From the base upsync directory: 

bin/export 

You will be prompted to review the export machine list and enter Y or N to continue. The export process 
will not continue until it receives a user input response. Enter "y” to continue, or “n” to cancel. If you 
wish to bypass all prompts and always continue with the export process without user input, run the 
export process with the force option. 

bin/export -f 

When the export completes, the high-side import can begin. Unmount the transfer drive from the low- 
side, and carry it to the high side location. 

9.2.3 High-Side Importing 

Once you have exported resources on a transfer drive, you can import them to a high-side Overmind 
range by running the import command line tool. Before running the import tool, mount the transfer 
drive on the high side. 

Run the tool with only the staging ground option. 

bin/import -i /mnt/transfer-drive/20140117174347 

In the beginning of the import process, before any actions are taken, you will be prompted to review the 
operations list and enter Y or N to continue. Operations will be performed in the order listed on output. 

If you wish to bypass all prompts and always continue with the export process without user input, run 
the export process with the force option. 

bin/import -i /mnt/transfer-drive/20140117174347 -f 
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After all the import operations are finished, the result for each operation will appear on the terminal and 
in the log file. 
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10 Appendix F - Windows Active Directory Setup 

1. Install 2k3/2k8 Server 

2. Run "dcpromo”. 

3. Click “next”. 
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5. Select new domain if no trust relationships needed. 
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Select forest functional level. 
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8. Select Domain Functional Level. 



9. If this is the first/only Domain Controller, you must select DNS server. 
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10. Configure the server with a static IP address. This is the best practice for Domain Controllers. 



11. If this is a standalone network with no upstream DNS, select “yes” otherwise resolve upstream 
DNS issues. 



12. Select Defaults. 
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13. Enter password. Password must meet complexity requirements. 
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14. Review summary and click “next”. 
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15. Click Finish and restart when completed. 
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11 Appendix G - Joining Test Server to Active Directory 
Domain 

This section outlines instructions for setting up a Linux machine to join Active Directory. The steps apply 
to Fedora 18 and newer. 

1. Make sure the following ports are opened in the firewall settings: 

• TCP: 80, 88, 389 

• UDP: 88 (at least one of TCP/UDP ports 88 has to be open) 

Also note that following ports are necessary for ipa-client working properly after enrollment: 

• TCP: 464 

• UDP: 464, 123 (if NTP enabled) 

To do this, set Firewall by running the following commands: 

firewall-cmd - -add-port=80/tcp --permanent 
firewall-cmd - -add-port=88/tcp --permanent 
firewall-cmd - -add-port=389/tcp --permanent 
firewall-cmd - -add-port=464/udp --permanent 
firewall-cmd - -add-port=123/udp --permanent 


2. Run the following commands to install packages and set ntpdate. 

yum install realmd sssd gvfs ntpdate PackageKit sssd-tools 
yum install samba-common ntpdate 

3. Run the following commands to sync date to domain or ntp server, 
ntpdate <domain_name or ntp server used by domain> 

4. Ensure your computer name is set and is less than 15 characters. To do this run the following 
command: 

hostnamectl set-hostname <new host name> --static 
hostnamectl set-hostname <new host name> --pretty 
hostnamectl set-hostname <new host name> --transient 

5. Run the following command to join to domain: NOTE: you must use “administrator” credentials 
to join to Windows Active Directory. 

realm join -v -U <domain_admin_username> <domain_name> 

6. To view a full list of domain information run the following command: 
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realm list 

Sample output is as follows: 
bt . net 

type: kerberos 
realm-name: BT.NET 
domain-name: bt.net 
configured: kerberos-member 
server -software : active -directory 
client-software: sssd 
required-package: sssd-tools 
required-package: sssd 
required-package: adcli 
required-package: samba-common 
login-formats: BT\%U 
login-policy: allow-any-login 

In this example, domain for this system is BT.NET of type Active-Directory. Login is "BT\<username>” and 

any valid user can login (allow-any-login). 

Realm allows you to manage enrollment in Active Directory and manage permissions and configurations. 

The following is the output of running the “realm” command. 

realm discover -v [realm-name] 

Discover available realm 

realm join -v [-U user] realm-name 
Enroll this machine in a realm 

realm leave -v [-U user] [realm-name] 

Unenroll this machine from a realm 

realm list 

List known realms 
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realm permit [-a] [-R realm] user ... 

Permit user logins 
realm deny [-a] [-R realm] 
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12 Appendix H - Securing Overview with Basic HTTP 
Authentication 

This section outlines how to setup authorization in Apache in an LDAP/Active Directory environment. 
These instructions apply to Fedora servers. 

1. Install httpd . 

yum -y install httpd openldap mod_ldap mod_authnz_external 

# create a link for Perl 

In -s /usr/bin/perl /usr/local/bin/perl 

2. Configure httpd. Open the httpd.conf file in a text editor, 
vi /etc/httpd/conf /httpd . conf 

3. Edit the following lines in httpd.conf. 

line 55: Add modules to load 

LoadModule ldap_module modules/mod_ldap. so 

LoadModule authnz_ldap_module modules/mod_authnz_ldap . so 

line 86: Admin's address 

ServerAdmin admin@bt.net 
line 95: uncomment and change to your server's name 
ServerName spartacus . bt . net : 80 
line 144: change (enable CGI and disable Indexes) 

Options FollowSymLinks ExecCGI 
line 151: change 
AllowOverride All 

line 294: uncomment and add file-type that apache looks them CGI 
AddHandler cgi-script .cgi .pi 
add at the last 

#server's header 
ServerTokens Prod 

# define on 
KeepAlive On 
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• file name it can access only with directory's name 
Directorylndex index.html index. cgi index. php 

4. Configure httpd authentication for domain/ldap user groups for the Overview pages: 
vi /etc/httpd/conf /httpd . conf 

In the example httpd.conf file below, 

• The user group “dart” is the general user of the system. 

• The user group “dartadmin” is the DART Administrator group. 

• Both groups need to be created on either the Active Directory Domain Controller or in 
LDAP config: 

<Directory "/var/www/html/overview"> 

AuthName "pvriate lgoin message here." 

AuthType Basic 
AuthBasicProvider ldap 
AuthUserFile /dev/null 

AuthLDAPURL ldap : //bt . net : 389/DC=bt , DC=net?sAMAccountName?sub? 
(objectClass=*) 

AuthLDAPBindDN admin@bt.net 
AuthLDAPBindPassword xxxxx 
AuthLDAPGroupAttributelsDN on 

Require ldap-group cn="dart", cn="users", dc="bt", dc="net" 
</Directory> 

<Directory "/var/www/html/overview/admin"> 

AuthName "private login message here." 

AuthType Basic 
AuthBasicProvider ldap 
AuthUserFile /dev/null 

AuthLDAPURL ldap : //bt . net : 389/DC=bt , DC=net?sAMAccountName?sub? 
(objectClass=*) 

AuthLDAPBindDN admin@bt.net 
AuthLDAPBindPassword xxxxx 
AuthLDAPGroupAttributelsDN on 

Require ldap-group cn="dartadmin", cn="users", dc="bt", dc="net" 
</Directory> 


5. Save the httpd.conf file and start/restart service. 
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systemctl restart httpd . service 

6. Set service to start with server 

systemctl enable httpd . service 
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